PCT 

INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(51) International Patent Classification 6 ■ 

A01N 63/00, A61K 39/395, C12N 15/00, 
A01N 61/00, C07H 21/02 



(11) International Publication Number: WO 99/49735 

(43) International Publication Date: 7 October 1999 (07.10.99) 



(21) International Application Number: 

(22) International Filing Date: 



PCT/US99/06644 
26 March 1999 (26.03.99) 



(30) Priority Data: 

60/079,759 
60/095,153 



27 March 1998 (27.03.98) 
3 August 1998 (03.08.98) 



(81) Designated States: AU, CA, JP, US, European patent (AT, BE, 
CH, CY, DE, DK, ES, EI, FR, GB, GR, IE, IT, LU, MC, 
NL, PT, SE). 



Published 

With international search report. 



(71) Applicant (for all designated States except US): FOX CHASE 

CANCER CENTER [US/US]; 7701 Burholme Avenue, 
Philadelphia, PA 19111 (US). 

(72) Inventors; and 

(75) Inventors/Applicants (for US only): KRUH, Gary [US/US]; 
241 South 6th Street #809, Philadelphia, PA 19106 (US). 
LEE, Kun [KR/US]; 21 Barrington Drive, Cranbury, NJ 
08512 (US). BELINSKY, Martin [US/US]; 625 Parmentier 
Road, Warminster, PA 18974 (US). BAIN, Lisa [US/US]; 
284 Penny Lane, Townville, SC 29689 (US). 

(74) Agents: RIGAUT, Kathleen, D. et al.; Dann, Dorfman, Herrell 
and Skillman, Suite 720, 1601 Market Street, Philadelphia, 
PA 19103 (US). 



(54) Title: MPR-R ELATED ABC TRANSPORTER ENCODING NUCLEIC ACIDS AND METHODS OF USE THEREOF 
(57) Abstract 

Novel human MOAT genes and their encoded proteins are provided herein. The MRP-related ABC transporters encoded by the 
disclosed nucleic acid sequences play a pivotal role in the efflux of pharmacologically benefical reagents from tumor cells. MOAT genes 
and their encoded proteins provide valuable therapeutic targets for the design of anti-cancer agents which inhibit the aberrant growth of 
malignant cells. 



7/15/2008, EAST Version: 2.2.1.0 









FOR THE PURPOSES OF INFORMATION ONLY 








Codes used to identify S 


tates party to the PCT on 


the front pages of pamphlets publishing international applications under the PCT. 


AL 


Albania 


ES 




LS 




SI 










Finland 






SK 


Slovakia 






FR 


Fiance 


LU 


Luxembourg 


SN 




AU 




GA 




LV 




SZ 


Senegal ^ 


A 7. 


Azerbaijan 




United Kingdom 


MC 








BA 


Bosnia and Herzegovina 


GE 


Georgia 


MD 


Republic of Moldova 


TG 


Togo 






GH 




MG 


Madagascar 


TJ 


Tajikistan 


BE 




GN 


C 


MK 


The former Yugoslav 


TM 


Turkmenistan 


BF 


Burkina Faso 


GR 






Republic of Macedonia 


TR 




BG 


Bulgaria 




Hungary 


ML 


Mali 




Trinidad and Tobago 


BJ 




IE 


Ireland 


MN 


Mongolia 


UA 




BR 


Brazil 


IL 


Israel 


MR 


Mauritania 


UG 


Uganda 


BY 




IS 




MW 


Malawi 


US 


United States of America 


CA 


Canada 


IT 




MX 




uz 


Uzbekistan 


CF 


Central African Republic 


JP 






Niger 


VN 


Viet Nam 


CG 


Congo 


KE 




NL 


Netherlands 


YU 


Yugoslavia 


CH 




KG 


Kyrgyzstan 






zw 


Zimbabwe 


CI 


C6te d'Tvoire 


KP 












CM 


Cameroon 




Republic of Korea 


PL 












KR 


Republic of Korea 


PT 


Portugal 






CU 


Cuba 




Kazakstan 










CZ 


Czech Republic 


LC 




RU 


Russian Federation 












Liechtenstein 


SD 


Sudan 






DK 




LK 


Sri Lanka 


SE 








EE 




LR 




SG 









7/15/2008, EAST Version: 2.2.1.0 



WO 99/49735 



PCT/US99/06644 



MRP-Related ABC Transporter 
Encoding Nucleic Acids and Methods of Use Thereof 



Pursuant to 35 U.S.C. §202 (c) it is acknowledged that 
the U.S. Government has certain rights in the invention 
described herein, which was made in part with funds from 
the National Institutes of Health, Grant Numbers, CA63173 
and CA0692 7. 

FIELD OF THE INVENTION 

The present invention relates to the fields of 
medicine and molecular biology. More specifically, the 
invention provides nucleic acid molecules and proteins 
encoded thereby which are involved in the development of 
resistance to pharmacological and chemotherapeutic agents 
in tumor cells. 

BACKGROUND OF THE INVENTION 

Several publications are referenced in this 
application in parentheses in order to more fully describe 
the state of the art to which this invention pertains. 
The disclosure of each of these publications is 
incorporated by reference herein. 

P-glycoprotein, the product of the MDR1 gene, was the 
first ABC transporter shown to confer resistance to 
cytotoxic agents. Pgp functions as an ATP-dependent 
efflux pump that reduces the intracellular concentration 
of a variety of chemotherapeutic agents by transporting 
them across the plasma membrane (1) . The multidrug 
resistance phenotype associated with overexpression of Pgp 
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is of considerable clinical interest because natural 
product drugs are second only to alkylating agents in 
clinical utility, and many effective chemotherapeutic 
regimens contain more than one natural product agent . 
More recently, we and others have reported transfection 
studies indicating that MRP, another ABC family 
transporter, confers a multidrug resistance phenotype that 
includes many natural product drugs, but is distinct from 
the resistance phenotype associated with Pgp (2-6) . MRP 
shares only limited amino acid identity with Pgp, and this 
is reflected in the different substrate specificities of 
the two transporters. In contrast to Pgp, MRP can 
transport a wide range of anionic organic conjugates, 
including glutathione S-conjugates (7) . In addition to 
Pgp and MRP there may be other transporters that are 
involved in cytotoxic drug resistance. In the case of 
natural product drugs, resistant cell lines have been 
described that display a multidrug resistant phenotype 
associated with a drug accumulation deficit, but do not 
overexpress Pgp or MRP (8) . ABC transporters have also 
been linked to cisplatin resistance, and several lines of 
evidence suggest the possibility that pumps specific for 
organic anions may be involved: 1) decreased cisplatin 
accumulation is consistently observed in cisplatin 
resistant cell lines (9); 2) cisplatin is conjugated to 
glutathione in the cell, and this anionic conjugate is 
toxic in an in vitro biochemical assay (10); and 3) 
biochemical studies using membrane vesicle preparations 
have shown that cisplatin resistant cells lines have 
enhanced expression of an ATP-dependent transporter of 
CDDP-glutathione and other glutathione S-conjugates such 
as the cystinyl leukotriene LTC 4 (11, 12) . These data thus 
suggest that an organic anion transporter may contribute 
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to cisplatin resistance by exporting CDDP-glutathione . 
While MRP is an organic anion transporter, the reported 
drug resistance profile of MRP-transf ected cells does not 
extend to this agent (5, 6), and to date only one 
cisplatin resistant cell line has been reported to 
overexpress MRP (13) . This suggests that organic anion 
transporters other than MRP may contribute to cisplatin 
resistance. Consistent with this possibility, the 
canalicular multispecif ic organic anion transporter, 
cMOAT , an MRP-related transporter that functions as the 
major organic anion transporter in liver, has been 
reported to be overexpressed in cisplatin resistant cell 
lines (14, 15) . A more direct link between cMOAT and 
cytotoxic drug resistance is suggested by a recent report 
in which transfection of a cMOAT antisense construct into 
a liver cancer cell line resulted in sensitization to 
cisplatin, daunorubicin and other cytotoxic agents (16) . 

Clearly, a need exists for identifying the essential 
components and mechanisms giving rise to drug resistance 
and the transport of anticancer agents out of the tumor 
cell. The elucidation of these mechanisms may be used to 
advantage for the design of efficacious chemotherapeutic 
agents . 

SUMMARY OF THE INVENTION 

This invention provides novel, biological molecules 
useful for identification, detection, and/or molecular 
characterization of components involved in the acquisition 
of drug resistance in tumor cells. According to one 
aspect of the invention, an isolated nucleic acid molecule 
is provided which includes a sequence encoding a protein 
transporter of a size between about 1300 and 1350 amino 
acids in length. The encoded protein, referred to herein 
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as MOAT-B, comprises a multi- domain structure including a 
tandem repeat of nucleotide binding folds appended 
C-terminal to a hydrophobic domain that contains several 
potential membrane spanning helices. Conserved Walker A 
and B ATP binding sites are present in each of the 
nucleotide binding folds. 

In a preferred embodiment of the invention, an 
isolated nucleic acid molecule is provided that includes a 
cDNA encoding a human MOAT-B protein. In a particularly 
preferred embodiment, the human MOAT-B protein has an 
amino acid sequence the same as Sequence I . D . No . 2 . An 
exemplary MOAT-B nucleic acid molecule of the invention 
comprises Sequence I.D. No. 1. 

According to another aspect of the invention, a 
second isolated nucleic acid molecule is provided which 
includes a sequence encoding a transporter between about 
1400 and 1450 amino acids. The encoded protein, referred 
to herein as MOAT-C contains a multi -domain structure 
including a tandem repeat of nucleotide binding folds 
appended C-terminal to a hydrophobic domain that contains 
several potential membrane spanning helices. Conserved 
Walker A and B ATP binding sites are present in each of 
the nucleotide binding folds. While similar in structure 
to MOAT-B described above, MOAT-C contains distinct 
sequence differences. 

In a preferred embodiment of the invention, an 
isolated nucleic acid molecule is provided that includes a 
cDNA encoding a human MOAT-C protein. In a particularly 
preferred embodiment, the human MOAT-C protein has an 
amino acid sequence the same as Sequence I.D. No. 4. An 
exemplary MOAT-C nucleic acid molecule of the invention 
comprises Sequence I.D. No. 3. 

According to yet another aspect of the invention, an 
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isolated nucleic acid molecule is provided which includes 
a sequence encoding a protein of a size between about 150 0 
and 1550 amino acids in length. The encoded protein, 
referred to herein as MOAT-D, contains a multidomain 
structure including an N-terminal hydrophobic extension 
which harbors five transmembrane spanning helices. 

In a preferred embodiment of the invention, an 
isolated nucleic acid molecule is provided that includes a 
cDNA encoding a MOAT-D protein. In a particularly 
preferred embodiment, the human MOAT-D protein has an 
amino acid sequence the same as Sequence I.D. No. 6. An 
exemplary MOAT-D nucleic acid molecule of the invention 
comprises Sequence I.D. No. 5. 

According to yet another aspect of the invention, an 
isolated nucleic acid molecule is provided which includes 
a sequence encoding a protein of a size between about 1480 
and 153 0 amino acids in length. The encoded protein, 
referred to herein as MOAT-E, contains a multidomain 
structure including an N-terminal hydrophobic extension 
'which harbors several transmembrane spanning helices. 
While similar in structure to MOAT-D described above, 
MOAT-E contains distinct sequence differences. 

In a preferred embodiment of the invention, an 
isolated nucleic acid molecule is provided that includes a 
cDNA encoding a MOAT-E protein. In a particularly 
preferred embodiment, the human MOAT-E protein has an 
amino acid sequence the same as Sequence I.D. No. 8. An 
exemplary MOAT-E nucleic acid molecule of the invention 
comprises Sequence I.D. No. 7. 

According to another aspect of the present invention, 
an isolated nucleic acid molecule is provided, which has a 
sequence selected from the group consisting of: (1) 
Sequence I.D. No. 1; (2) a sequence specifically 
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hybridizing with preselected portions or all of the 
complementary strand of Sequence I.D. No. 1 comprising 
nucleic acids encoding amino acids 1-1154 of Sequence ID 
No. 2; (3) a sequence encoding preselected portions of 
Sequence I.D. No. 1 within nucleotides 1-3462, (4) 
Sequence I.D. No. 3; (5) a sequence specifically 
hybridizing with preselected portions or all of the 
complementary strand of Sequence I.D. No. 3 comprising 
nucleic acids encoding amino acids 1-442 of Sequence ID 
No. 4; (6) a sequence encoding preselected portions of 
Sequence I.D. No. 3 within nucleotides 1-1326, (7) 
Sequence I.D. No. 5; (8) a sequence specifically 
hybridizing with preselected portions or all of the 
complementary strand of Sequence I.D. No. 5 comprising 
nucleic acids encoding amino acids 1-1036 of Sequence ID 
No. 6; (9) a sequence encoding preselected portions of 
Sequence I.D. No. 5 within nucleotides 1-3108, (1) 
Sequence I.D. No. 7; (2) a sequence specifically 
hybridizing with preselected portions or all of the 
complementary strand of Sequence I.D. No. 7 comprising 
nucleic acids encoding amino acids 1-998 of Sequence ID 
No. 8; (3) a sequence encoding preselected portions of 
Sequence I.D. No. 7 within nucleotides 1-300. 

Such partial sequences are useful as probes to 
identify and isolate homologues of the MOAT genes of the 
invention. Additionally, isolated nucleic acid sequences 
encoding natural allelic variants of the nucleic acids of 
Sequence I.D. Nos . , 1, 3, 5 and 7 are also contemplated to 
be within the scope of the present invention. The term 
natural allelic variants will be defined hereinbelow. 

According to another aspect of the present invention, 
antibodies immunologically specific for the human MOAT 
proteins described hereinabove are provided. 
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In yet another aspect of the invention, host cells 
comprising at least one of the MOAT encoding nucleic acids 
are provided. Such host cells include but are not limited 
to bacterial cells, fungal cells, insect cells, mammalian 
cells, and plant cells. Host cells overexpressing one 
or more of the MOAT encoding nucleic acids of the 
invention provide valuable research tools for assessing 
transport of chemotherapeutic agents out of cells. 
MOAT expressing cells also comprise a biological system 
useful in methods for identifying inhibitors of the MOAT 
transporters . 

Another embodiment of the present invention 
encompasses methods for screening cells expressing MOAT 
encoding nucleic acids for chemotherapy resistance. Such 
methods will provide the clinician with data which 
correlates expression of a particular MOAT genes with a 
particular chemotherapy resistant phenotype . 

Diagnostic methods are also contemplated in the 
present invention. Accordingly, suitable oligonucleotide 
probes are provided which hybridize to the nucleic acids 
of the invention. Such probes may be used to advantage in 
screening biopsy samples for the expression of particular 
MOAT genes. Once a tumor sample has been characterized as 
to the MOAT gene(s) expressed therein, inhibitors 
identified in the cell line screening methods described 
above may be administered to prevent efflux of the 
beneficial chemotherapeutic agents from cancer cells. 

The methods of the invention may be applied to kits. 
An exemplary kit of the invention comprises MOAT gene 
specific oligonucleotide probes and/or primers, MOAT 
encoding DNA molecules for use as a positive control, 
buffers, and an instruction sheet. A kit for practicing 
the cell line screening method includes frozen cells 



7/15/2008, EAST Version: 2.2.1.0 



WO 99/49735 PCT/US99/06644 

comprising the MOAT genes of the invention, suitable 
culture media, buffers and an instruction sheet. 

In a further aspect of the invention, transgenic 
knockout mice are disclosed. Mice will be generated in 
which at least one MOAT gene has been knocked out. Such 
mice will provide a valuable in biological system for 
assessing resistance to chemotherapy in an in vivo tumor 
model . 

Various terms relating to the biological molecules of 
the present invention are used hereinabove and also 
throughout the specification and claims. The terms 
"percent similarity" and "percent identity (identical)" 
are used as set forth in the UW GCG Sequence Analysis 
program (Devereux et al . NAR 12:387-397 (1984)). 

With reference to nucleic acids of the invention, the 
term "isolated nucleic acid" is sometimes used. This 
term, when applied to DNA, refers to a DNA molecule that 
is separated from sequences with which it is immediately 
contiguous (in the 5' and 3' directions) in the naturally 
occurring genome of the organism from which it originates. 
For example, the "isolated nucleic acid" may comprise a 
DNA or cDNA molecule inserted into a vector, such as a 
plasmid or virus vector, or integrated into the genomic 
DNA of a prokaryote or eukaryote . 

With respect to RNA molecules of the invention, the 
term "isolated nucleic acid" primarily refers to an RNA 
molecule encoded by an isolated DNA molecule as defined 
above. Alternatively, the term may refer to an RNA 
molecule that has been sufficiently separated from RNA 
molecules with which it would be associated in its natural 
state (i.e., in cells or tissues), such that it exists in 
a "substantially pure" form (the term "substantially pure" 
is defined below) . 
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With respect to protein, the term "isolated protein" 
or "isolated and purified protein" is sometimes used 
herein. This term refers primarily to a protein produced 
by expression of an isolated nucleic acid molecule of the 
invention. Alternatively, this term may refer to a 
protein which has been sufficiently separated from other 
proteins with which it would naturally be associated, so 
as to exist in "substantially pure" form. 

The term "substantially pure" refers to a preparation 
comprising at least 50-60% by weight the compound of 
interest (e.g., nucleic acid, oligonucleotide, protein, 
etc.). More preferably, the preparation comprises at 
least 75% by weight, and most preferably 90-99% by weight, 
the compound of interest. Purity is measured by methods 
appropriate for the compound of interest (e.g. 
chromatographic methods, agarose or polyacrylamide gel 
electrophoresis, HPLC analysis, and the like) . With 
respect to antibodies of the invention, the term 
"immunologically specific" refers to antibodies that bind 
to one or more epitopes of a protein of interest (e.g., 
MOAT - B , MOAT-C or MOAT-D) , but which do not substantially 
recognize and bind other molecules in a sample containing 
a mixed population of antigenic biological molecules. 

With respect to nucleic acids and oligonucleotides, 
the term "specifically hybridizing" refers to the 
association between two single- stranded nucleotide 
molecules of sufficiently complementary sequence to permit 
such hybridization under pre-determined conditions 
generally used in the art (sometimes termed "substantially 
complementary") . When used in reference to a double 
stranded nucleic acid, this term is intended to signify 
that the double stranded nucleic acid has been subjected 
to denaturing conditions, as is well known to those of 

9 
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skill in the art. In particular, the term refers to 
hybridization of an oligonucleotide with a substantially 
complementary sequence contained within a single-stranded 
DNA or RNA molecule of the invention, to the substantial 
exclusion of hybridization of the oligonucleotide with 
single-stranded nucleic acids of non- complementary 
sequence . 

One common formula for calculating the stringency 
conditions required to achieve hybridization between 
nucleic acid molecules of a specified sequence homology 
(Sambrook et al . , 1989): 

T. = 81.5"C + 16.6Log [Ha+] + 0.41 (% G+C) -0.63 (% formamide) - 
600/#bp in duplex 

As an illustration of the above formula, using [Na+] 
= [0.368] and 50% formamide, with GC content of 42% and an 
average probe size of 200 bases, the T m is 57°C. The T m of 
a DNA duplex decreases by 1 - l.5°C with every 1% decrease 
in homology. Thus, targets with greater than about 75% 
sequence identity would be observed using a hybridization 
temperature of 42 °C. Such sequences would be considered 
substantially homologous to the nucleic acid sequences of 
the invention. 

The nucleic acids, proteins, antibodies, cell lines, 
methods, and kits of the present invention may be used to 
advantage to identify targets for the development of novel 
agents which inhibit the aberrant transport of cytoxic 
agents out of tumor cells. The transgenic mice of the 
invention may be used an in vivo model for chemotherapy 
resistance . 

The human MOAT molecules methods and kits described 
above may also be used as research tools and will 
facilitate the elucidation of the mechanism by which tumor 
10 
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BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 shows the predicted structure of MOAT-B and 
comparison with human MRP. The vertical lines indicate 
identical amino acids and the vertical dots indicate 
conserved amino acids. Gaps are indicated by periods. 
The overbars indicate potential transmembrane spanning 
segments as predicted by the TMAP program. The first and 
second nucleotide binding folds (NBF 1 and NBF 2) are 
indicated by horizontal arrows. The C-terminal 34 amino 
acids (residues 1291 - 1325) are replaced in the second 
class of MOAT-B cDNA clones by the following amino acids: 
ILQKKLSTYWSH. The Alignment was performed using the GAP 
program (gap weight 3.0, length weight 0.1) in the 
Genetics Computer Group Package. H. MRP: human MRP. 

Figures 2A and 2B depict a comparison of the 
nucleotide binding folds and hydropathy profile of MOAT-B 
with those of other eukaryotic ABC transporters. Fig. 1A 
shows the comparison of the nucleotide binding folds of 
MOAT-B. Amino acids that are identical to those of MOAT-B 
are shaded, and gaps are indicated by periods. Walker A 
and B motifs, and the ABC transporter family signature 
sequence C, are underlined. Amino acid positions are 
indicated to the right. Amino acid sequences were aligned 
using the PILEUP program (gap weight 3.0, length weight 
0.1) in the Genetics Computer Group Package. Fig. IB 
shows a comparison of the MOAT-B hydropathy profile. To 
facilitate comparison, the proteins are aligned so that 
the N-terminal nucleotide binding folds (NBF) are roughly 
in register. NBF 1 s are indicated by bars. Values above 
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and below the horizontal lines indicate hydrophobic and 
hydrophilic regions, respectively. Hydrophobicity plots 
were generated using the Kyte-Doolittle algorithm with a 
window of 7 residues. The transporters shown are: human 
multidrug-associated protein, H. MRP (P33529) ; human 
multispecif ic organic anion transporter, H. MOAT (U63970) ; 
Saccharomyces cerevisiae yeast cadmium factor 1, S. YCF1 
(P39109) ; rat sulfonylurea receptor, R. SUR (Q09427) ; 
human cystic fibrosis transmembrane conductance regulator, 
H. CFTR (M28668); Leishmania P-glycoprotein, L. PgpA 
(P21441) and human mdrl gene product, H. MDR1 (P08183) . 
Accession numbers are shown in parentheses. 

Figure 3 is a Northern blot showing the tissue 
distribution of MOAT-B transcript. Membranes containing 
poly (A) + RNA prepared from human tissues were hybridized 
with a radiolabeled MOAT-B or GAPDH probe. Top panels 
show MOAT-B transcript and bottom panels show the control 
GAPDH transcript. Arrows indicate the position of MOAT-B 
transcript. Prolonged exposure of the film revealed a low 
level signal in liver. 

Figure 4 shows the chromosomal localization of the 
gene encoding MOAT-B. Human metaphase spreads were 
hybridized with a biot in- labeled MOAT-B cDNA probe and 
detected by FITC-conjugated avidin. Hybridization signals 
at chromosome 13q32 in two metaphase spreads are indicated 
by arrows. The inset shows paired hybridization signals 
at band q32 of chromosome 13 from three other metaphase 
spreads . 

Figures 5A and 5B show the predicted structures of 
MOAT-C and MOAT-D. Fig. 5A presents the structure of 
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MOAT-C. Fig. 5B shows the structure of MOAT-D. Numbered 
overbars indicate potential transmembrane spanning 
helices. Horizontal arrows indicate the positions of the 
amino terminal (NBF1) and C-terminal (NBF2) nucleotide 
binding folds. Walker A and B motifs, and the ABC 
transporter family signature sequence C are underlined. 
Bullets indicate the positions of potential N-linked 
glycosylation sites that are conserved with previously, 
reported N-glycosylation sites in MRP. The indicated 
MOAT-C transmembrane spanning helices were predicted using 
the TMAP program and an input alignment of MOAT-B and 
MOAT-C. The indicated MOAT-D transmembrane helices are 
based upon inspection of an alignment with MRP. 

Figures 6A and 6B show a comparison of the nucleotide 
binding folds and hydropathy profiles of MOAT-C and MOAT-D 
with those of other related ABC transporters. Fig. 6A 
depicts the comparison of the nucleotide binding folds. 
The alignment was produced using the PILEUP command (gap 
weight 3.0, length weight 0.1) in the Genetics Computer 
Group Package Version 9.1. Amino acid positions conserved 
in at least 4 of the 8 proteins are shaded. Periods 
indicate gaps in the alignment. Walker A and B, and the 
ABC transporter family signature sequence C are indicated 
by underbars. Fig. 6A shows the comparison of hydropathy 
profiles. To facilitate comparisons, gaps were introduced 
at the N- termini of some proteins in order to bring the 
first nucleotide binding folds into register. Nucleotide 
binding folds are indicated by bars. Values above and 
below the horizontal lines indicate hydrophobic and 
hydrophilic regions, respectively. Hydrophobicity plots 
were generated using the Kyte-Doolittle algorithm with a 
window of 7 residues. Accession numbers are as follows: 

13 
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MRP, P33529; cMOAT , U63970, SUR, Q09428; CFTR, P-13569; 
MDR1, P08183. 



Figure 7 is a Northern blot showing the tissue 
distribution of MOAT-C and MOAT-D transcripts. Blots 
containing poly A+ RNA prepared from various human tissues 
were hybridized with MOAT-C, MOAT-D and act in probes. 
Arrows indicate the position of the MOAT-C (top panel) and 
MOAT-D (middle panel) transcripts. The bottom panel shows 
the control actin transcript. 

Figures 8A and 8B show the chromosomal localization 
of the MOAT-C and MOAT-D genes. Human metaphase spreads 
were hybridized with a biotin-labeled MOAT-C and MOAT-D 
cDNA probes and detected by FITC-conjugated avidin. Fig. 
8A shows the localization of MOAT-C. Hybridization 
signals at chromosome 3q27 in two metaphase spreads are 
indicated by arrows (top) . The inset shows paired 
hybridization signals at band q27 of chromosome 3 from 
three other metaphase spreads. Fig. 8B shows the 
localization of MOAT-D. Hybridization signals at 
chromosome 17q21-22 in two metaphase spreads are indicated 
by arrows (top) . The inset shows paired hybridization 
signals at band q21-22 of chromosome 17 from three other 
metaphase spreads. 

Figure 9 shows predicted amino acid sequence of MOAT- 
E. Also shown are the location of the potential 
transmembrane helices (overbars) , the potential N- 
glycosylation site (black dot) and the two nucleotide 
binding folds (NBF1 and NBF2) . Walker A and B motifs, as 
well as the signature C motif of ABC transporters, are 
also indicated. 

14 
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Figure 10 shows a comparison of the hydropathy 
profile of MOAT - E with other members of the MRP-cMOAT 
subfamily. The profile reveals that MOAT-E has a 
hydrophobic N-terminal segment which is absent in MOAT-B 
and MOAT-C. 

Figure 11 is a RNA blot which reveals that MOAT-E is 
expressed only in the liver and the kidney, suggesting 
that MOAT-E may participate in the excretion of substances 
into urine and bile. The lower panel shows hybridization 
of an actin probe to assess RNA loading. 

Figures 12A-12J show the cDMA (SEQ ID NO : 1) and 
amino acid sequences (SEQ ID NO: 2) encoded by MOATB . 

Figures 13A-13K show the cDNA (SEQ ID NO: 3) and 
amino acid sequences (SEQ ID NO: 4) encoded by MOATC . 

Figures 14A-14K show the cDNA (SEQ ID NO: 5) and 
amino acid sequences (SEQ ID NO: 6) encoded by MOATD . 

Figures 15A-15K show the cDNA (SEQ ID NO: 7) and 
amino acid sequences (SEQ ID NO: 8) encoded by MOATE . 

DETAILED DESCRIPTION OF THE INVENTION 

MRP and cMOAT are closely related mammalian ABC 
transporters that export organic anions from cells. 
Transfection studies have established that MRP confers 
resistance to natural product cytotoxic agents, and recent 
evidence suggests the possibility that cMOAT may 
contribute to cytotoxic drug resistance as well. Based 
upon the potential importance of these transporters in 
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clinical drug resistance, and their important 
physiological roles in the export of the amphiphilic 
products of phase I and phase II metabolism, we sought to 
identify other MRP-related transporters. Using a 
degenerate PCR approach, a cDNA molecule was isolated 
which encodes a novel ABC transporter designated herein as 
MOAT-B . The MOAT-B gene was mapped using fluorescence in 
situ hybridization to chromosome band 13q32 . Comparison 
of the MOAT-B predicted protein with other transporters 
revealed that it is most closely related to MRP, cMOAT , 
and the yeast organic anion transporter YCF1 . While 
MOAT-B is closely related to these transporters, it is 
distinguished by the absence of approximately 2 00 amino 
acid N- terminal hydrophobic extension that is present in 
MRP and cMOAT , and which is predicted to encode several 
transmembrane spanning segments. In addition, the MOAT-B 
tissue distribution is distinct from MRP and cMOAT. In 
contrast to MRP, which is widely expressed in most 
tissues, including liver, and cMOAT, whose expression is 
largely restricted to liver, the MOAT-B transcript is 
widely expressed, with particularly high levels in 
prostate, but is barely detectable in liver. These data 
indicate that MOAT-B is a ubiquitously expressed 
transporter that is closely related to MRP and cMOAT, and 
indicate that it is an organic anion pump relevant to 
cellular detoxification. 

Three additional MRP/cMOAT-related transporters, 
MOAT-C, MOAT-D and MOAT-E are also disclosed herein. 
MOAT-C encodes a 1437 amino acid protein that is most 
closely related to MRP, cMOAT and MOAT-B, among eukaryotic 
transporters (33% - 37% identity) . However, based upon 
amino acid identity, MOAT-C is considerably less related 
to MRP and cMOAT than the latter transporters are to each 

16 



7/15/2008, EAST Version: 2.2.1.0 



WO 99/49735 PCT/US99/06644 

other (48% identity) . In addition, the MOAT-C topology is 
distinct from that of MRP and cMOAT in that it, like 
MOAT - B , lacks an N-terminal transmembrane spanning domain. 
MOAT-D encodes a 153 0 amino acid transporter that is 
highly related to MRP (57% identity) and cMOAT (4 7% 
identity) . MOAT - E encodes 1503 amino acid transporter 
that is highly related to MOAT-D, MRP and cMOAT (39-45% 
identity) . The topology of MOAT-D and MOAT-E are quite 
similar to MRP and cMOAT, in that they have an N-terminal 
hydrophobic extension that is predicted to harbor five 
transmembrane spanning helices. MOAT-C and MOAT-D were 
mapped to chromosome bands 3q27 and 17q21-22, 
respectively, by fluorescence in situ hybridization. 

The expression patterns of MOAT-C, MOAT-D and MOAT-E 
are distinct from those of MRP, cMOAT and MOAT-B. MOAT-C 
transcript is widely expressed, with highest levels in 
skeletal muscle, kidney and testis, but is expressed at 
barely detectable levels in liver and lung. MOAT-D 
transcript has a more restricted expression pattern, with 
high levels in colon, pancreas, liver and kidney. Data 
presented herein reveal that MOAT-E expression is 
restricted to liver and kidney. 

Based upon degree of amino acid identity, and protein 
topology, the MRP-related transporters fall into two 
groups, with the first group consisting of MRP, cMOAT, 
MOAT-D and MOAT-E, and the second group consisting of 
MOAT-B and MOAT-C. The isolation of MOAT-C, MOAT-D and 
MOAT-E thus helps to define the MRP/cMOAT subfamily. The 
high degree of amino acid identity and topological 
similarity of MOAT-D and MOAT-E to MRP and cMOAT suggest 
that they function as organic anion transporters, and play 
a role in cytotoxic drug resistance. In contrast, the 
lower degree of amino acid identify and distinct topology 
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of MOAT-B and MOAT-C suggest the possibility that their 
substrate specificities and functions may be distinct from 
that of MRP, cMOAT , MOAT-D and MOAT-E. 

The compositions, methods, kits and transgenic mice 
of the invention disclosed herein will facilitate the 
identification of drugs that cripple the ability of MOAT 
genes and proteins encoded thereby to effect the efflux of 
clinically beneficial pharmacological agents in malignant 
cells . 

I. Preparation of MOAT-Encoding Nucleic Acid Molecules, 
MOAT Proteins, and Antibodies Thereto 
A. Nucleic Acid Molecules 

Nucleic acid molecules encoding the MOAT proteins of 
the invention may be prepared by two general methods: (1) 
synthesis from appropriate nucleotide triphosphates, or 
(2) isolation from biological sources. Both methods 
utilize protocols well known in the art. The availability 
of nucleotide sequence information, such as cDNAs having 
Sequence I.D. Nos . 1, 3, 5, or 7 enables preparation of an 
isolated nucleic acid molecule of the invention by 
oligonucleotide synthesis. Synthetic oligonucleotides may 
be prepared by the phosphoramidite method employed in the 
Applied Biosystems 3 8A DNA Synthesizer or similar devices. 
The resultant construct may be purified according to 
methods known in the art, such as high performance liquid 
chromatography (HPLC) . Long, double- stranded 
polynucleotides, such as a DNA molecule of the present 
invention, must be synthesized in stages, due to the size 
limitations inherent in current oligonucleotide synthetic 
methods. Thus, for example, a 5 kb double-stranded 
molecule may be synthesized as several smaller segments of 
appropriate complementarity. Complementary segments thus 
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produced may be annealed such that each segment possesses 
appropriate cohesive termini for attachment of an adjacent 
segment. Adjacent segments may be ligated by annealing 
cohesive termini in the presence of DNA ligase to 
construct an entire 5 kb double -stranded molecule. A 
synthetic DNA molecule so constructed may then be cloned 
and amplified in an appropriate vector. 

Nucleic acid sequences encoding the MOAT proteins of 
the invention may be isolated from appropriate biological 
sources using methods known in the art. In a preferred 
embodiment, a cDNA clone is isolated from a cDNA 
expression library of human origin. In an alternative 
embodiment, utilizing the sequence information provided by 
the cDNA sequence, human genomic clones encoding MOAT 
proteins may be isolated. Alternatively, cDNA or genomic 
clones having homology with MOAT-B, MOAT-C, MOAT-D or 
MOAT - E may be isolated from other species using 
oligonucleotide probes corresponding to predetermined 
sequences within the MOAT encoding nucleic acids. 

In accordance with the present invention, nucleic 
acids having the appropriate level of sequence homology 
with the protein coding region of Sequence I.D. Nos . 1, 3, 
5, and 7 may be identified by using hybridization and 
washing conditions of appropriate stringency. For 
example, hybridizations may be performed, according to the 
method of Sambrook et al . , (supra) using a hybridization 
solution comprising: 5X SSC, 5X Denhardt ' s reagent, 1.0% 
SDS, 10 0 peg/ml denatured, fragmented salmon sperm DNA, 
0.05% sodium pyrophosphate and up to 50% formamide. 
Hybridization is carried out at 37-42°C for at least six 
hours. Following hybridization, filters are washed as 
follows: (1) 5 minutes at room temperature in 2X SSC and 
1% SDS; (2) 15 minutes at room temperature in 2X SSC and 
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0.1% SDS; (3) 30 minutes-1 hour at 37°C in IX SSC and 1% 
SDS; (4) 2 hours at 42-65°in IX SSC and 1% SDS, changing 
the solution every 3 0 minutes. 

Nucleic acids of the present invention may be 
maintained as DNA in any convenient cloning vector. In a 
preferred embodiment, clones are maintained in a plasmid 
cloning/expression vector, such as pBluescript 
(Stratagene, La Jolla, CA) , which is propagated in a 
suitable E. coli host cell. 

MOAT -encoding nucleic acid molecules of the invention 
include cDNA, genomic DNA, RNA, and fragments thereof 
which may be single- or doubl e- stranded . Thus, this 
invention provides oligonucleotides (sense or antisense 
strands of DNA or RNA) having sequences capable of 
hybridizing with at least one sequence of a nucleic acid 
molecule of the present invention, such as selected 
segments of the cDNA having Sequence I.D. No. 1. Such 
oligonucleotides are useful as probes for detecting or 
isolating MOAT genes. Antisense nucleic acid molecules 
may be targeted to translation initiation sites and/or 
splice sites to inhibit the translation of the 
MOAT-encoding nucleic acids of the invention. Such 
antisense molecules are typically between 15 and 30 
nucleotides and length and often span the translational 
start site of MOAT encoding mRNA molecules. 

It will be appreciated by persons skilled in the art 
that variants of these sequences exist in the human 
population, and must be taken into account when designing 
and/or utilizing oligos of the invention. Accordingly, it 
is within the scope of the present invention to encompass 
such variants, with respect to the MOAT sequences 
disclosed herein or the oligos targeted to specific 
locations on the respective genes or RNA transcripts. 
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With respect to the inclusion of such variants, the term 
"natural allelic variants" is used herein to refer to 
various specific nucleotide sequences and variants thereof 
that would occur in a human population. The usage of 
different wobble codons and genetic polymorphisms which 
give rise to conservative or neutral amino acid 
substitutions in the encoded protein are examples of such 
variants. Additionally, the term "substantially 
complementary" refers to oligo sequences that may not be 
perfectly matched to a target sequence, but the mismatches 
do not materially affect the ability of the oligo to 
hybridize with its target sequence under the conditions 
described. 

Full-length MOAT-B, MOAT-C, MOAT - D and MOAT-E 
proteins of the present invention may be prepared in a 
variety of ways, according to known methods. The proteins 
may be purified from appropriate . sources , e.g., 
transformed bacterial or animal cultured cells or tissues, 
by immunoaf finity purification. However, this is not a 
preferred method due to the low amount of protein likely 
to be present in a given cell type at any time. The 
availability of nucleic acid molecules encoding MOAT 
proteins enables production of the proteins using in vitro 
expression methods known in the art. For example, a cDNA 
or gene may be cloned into an appropriate in vitro 
transcription vector, such as pSP64 or pSP65 for in vitro 
transcription, followed by cell-free translation in a 
suitable cell-free translation system, such as wheat germ 
or rabbit reticulocytes. Jn vitro transcription and 
translation systems are commercially available, e.g., from 
Promega Biotech, Madison, Wisconsin or Gibco-BRL, 
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Gaithersburg, Maryland. 

Alternatively, according to a preferred embodiment, 
larger quantities of MOAT proteins may be produced by 
expression in a suitable prokaryotic or eukaryotic system. 
For example, part or all of a DNA molecule, such as a cDNA 
having Sequence I.D. No. 1, 3, 5 or 7 may be inserted into 
a plasmid vector adapted for expression in a bacterial 
cell, such as E. coli. Such vectors comprise the 
regulatory elements necessary for expression of the DNA in 
the host cell positioned in such a manner as to permit 
expression of the DNA in the host cell. Such regulatory 
elements required for expression include promoter 
sequences, transcription initiation sequences and, 
optionally, enhancer sequences. 

The human MOAT proteins produced by gene expression 
in a recombinant procaryotic or eukaryotic system may be 
purified according to methods known in the art. In a 
preferred embodiment, a commercially available 
expression/secretion system can be used, whereby the 
recombinant protein is expressed and thereafter secreted 
from the host cell, to be easily purified from the 
surrounding medium. If expression/secretion vectors are 
not used, an alternative approach involves purifying the 
recombinant protein by affinity separation, such as by 
immunological interaction with antibodies that bind 
specifically to the recombinant protein or nickel columns 
for isolation of recombinant proteins tagged with 6-8 
histidine residues at their N-terminus or C- terminus. 
Alternative tags may comprise the FLAG epitope or the 
hemagglutinin epitope. Such methods are commonly used by 
skilled practitioners. 

The human MOAT proteins of the invention, prepared by 
the aforementioned methods, may be analyzed according to 
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standard procedures. For example, such proteins may be 
subjected to amino acid sequence analysis, according to 
known methods. 

The present invention also provides antibodies 
capable of immunospecif ically binding to proteins of the 
invention. Polyclonal antibodies directed toward human 
MOAT proteins may be prepared according to standard 
methods. In a preferred embodiment, monoclonal antibodies 
are prepared, which react immunospecif ically with the 
various epitopes of the MOAT proteins described herein. 
Monoclonal antibodies may be prepared according to general 
methods of Kohler and Milstein, following standard 
protocols. Polyclonal or monoclonal antibodies that 
immunospecifically interact with MOAT proteins can be 
utilized for identifying and purifying such proteins. For 
example, antibodies may be utilized for affinity 
separation of proteins with which they immunospecifically 
interact. Antibodies may also be used to 
immunoprecipitate proteins from a sample containing a 
mixture of proteins and other biological molecules. Other 
uses of anti-MOAT antibodies are described below. 

II. Uses of MOAT-Encoding Nucleic Acids, 
MOAT Proteins and A ntibodies Thereto 

Cellular transporter molecules have received a great 
deal of attention as potential targets of chemotherapeutic 
agents designed to effectively block the export of 
pharmacological reagents from tumor cells. The MOAT 
proteins of the invention play a pivotal role in the 
transport of molecules across the cell membrane. 

Additionally, MOAT nucleic acids, proteins and 
antibodies thereto, according to this invention, may be 
used as research tools to identify other proteins that are 
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intimately involved in the transport of molecules into and 
out of cells. Biochemical elucidation of molecular 
mechanisms which govern such transport will facilitate the 
development of novel anti- transport agents that may 
sensitize tumor cells to conventional chemotherapeutic 
agents . 

A. MOAT-Encoding Nucleic Acids 

MOAT-encoding nucleic acids may be used for a variety 
of purposes in accordance with the present invention. 
MOAT-encoding DNA, RNA, or fragments thereof may be used 
as probes to detect the presence of and/or expression of 
genes encoding MOAT proteins. Methods in which 
MOAT-encoding nucleic acids may be utilized as 
probes for such assays include, but are not limited to: 
(1) in situ hybridization; (2) Southern hybridization (3) 
northern hybridization; and (4) assorted amplification 
reactions such as polymerase chain reactions (PCR) . 

The MOAT-encoding nucleic acids of the invention may 
also be utilized as probes to identify related genes from 
other animal species. As is well known in the art, 
hybridization stringencies may be adjusted to allow 
hybridization of nucleic acid probes with complementary 
sequences of varying degrees of homology. Thus, 
MOAT-encoding nucleic acids may be used to advantage to 
identify and characterize other genes of varying degrees 
of relation to the MOAT genes of the invention. Such 
information enables further characterization of 
transporter molecules which give rise to the 
chemoresistant phenotype of certain tumors. Additionally, 
they may be used to identify genes encoding proteins that 
interact with MOAT proteins (e.g., by the "interaction 
trap" technique) , which should further accelerate 
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identification of the components involved in the 
acquisition of drug resistance. The MOAT encoding nucleic 
acids may also be used to generate primer sets suitable 
for PCR amplification of target MOAT DNA. Criteria for 
selecting suitable primers are well known to those of 
ordinary skill in the art. 

Nucleic acid molecules, or fragments thereof, 
encoding MOAT genes may also be utilized to control the 
production of MOAT proteins, thereby regulating - the amount 
of protein available to participate in cytotoxic drug 
efflux. As mentioned above, antisense oligonucleotides 
corresponding to essential processing sites in 
MOAT-encoding mRNA molecules may be utilized to inhibit 
MOAT protein production in targeted cells. Alterations in 
the physiological amount of MOAT proteins may dramatically 
affect the ability of these proteins to transport 
pharmacological reagents out of the cell. 

Host cells comprising at least one MOAT encoding DNA 
molecule are encompassed in the present invention. Host 
cells contemplated for use in the present invention 
include but are not limited to bacterial cells, fungal 
cells, insect cells, mammalian cells, and plant cells. 
The MOAT encoding DNA molecules may introduced singly into 
such host cells or in combination to assess the phenotype 
of cells conferred by such expression. Methods for 
introducing DNA molecules are also well known to those of 
ordinary skill in the art. Such methods are set forth in 
Ausubel et al . eds . , Current P rotocols in Molecular 
Biology, John Wiley & Sons, NY, NY 1995, the disclosure of 
which is incorporated by reference herein. 

The availability of MOAT encoding nucleic acids 
enables the production of strains of laboratory mice 
carrying part or all of the MOAT genes or mutated 
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sequences thereof. Such mice may provide an in vivo model 
for development of novel chemotherapeutic agents. 
Alternatively, the MOAT nucleic acid sequence information 
provided herein enables the production of knockout mice in 
which the endogenous genes encoding MOAT - B , MOAT-C, MOAT-D 
or MOAT-E have been specifically inactivated. Methods of 
introducing transgenes in laboratory mice are known to 
those of skill in the art. Three common methods include: 
1. integration of retroviral vectors encoding the foreign 
gene of interest into an early embryo; 2. injection of 
DNA into the pronucleus of a newly fertilized egg; and 3. 
the incorporation of genetically manipulated embryonic 
stem cells into an early embryo. 

The alterations to the MOAT gene envisioned herein 
include modifications, deletions, and substitutions. 
Modifications and deletions render the naturally occurring 
gene nonfunctional, producing a "knock out" animal. 
Substitutions of the naturally occurring gene for a gene 
from a second species results in an animal which produces 
an MOAT gene from the second species. Substitution of the 
naturally occurring gene for a gene having a mutation 
results in an animal with a mutated MOAT protein. A 
transgenic mouse carrying the human MOAT gene is generated 
by direct replacement of the mouse MOAT gene with the 
human gene. These transgenic animals are valuable for use 
in vivo assays for elucidation of other medical disorders 
associated with cellular activities modulated by MOAT 
genes. A transgenic animal carrying a "knock out" of a 
MOAT encoding nucleic acid is useful for the establishment 
of a nonhuman model for chemotherapy resistance involving 
MOAT regulation. 

As a means to define the role that MOAT plays in 
mammalian systems, mice can be generated that cannot make 
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MOAT proteins because of a targeted mutational disruption 
of a MOAT gene. 

The term "animal" is used herein to include all 
vertebrate animals, except humans. It also includes an 
individual animal in all stages of development, including 
embryonic and fetal stages. A "transgenic animal" is any 
animal containing one or more cells bearing genetic 
information altered or received, directly or indirectly, 
by deliberate genetic manipulation at the subcellular 
level, such as by targeted recombination or microinjection 
or infection with recombinant virus. The term "transgenic 
animal" is not meant to encompass classical cross-breeding 
or in vitro fertilization, but rather is meant to 
encompass animals in which one or more cells are altered 
by or receive a recombinant DNA molecule. This molecule 
may be specifically targeted to defined genetic locus, be 
randomly integrated within a chromosome, or it may be 
extrachromosomally replicating DNA. The term "germ cell 
line transgenic animal" refers to a transgenic animal in 
which the genetic alteration or genetic information was 
introduced into a germ line cell, thereby conferring the 
ability to transfer the genetic information to offspring. 
If such offspring in fact, possess some or all of that 
alteration or genetic information, then they, too, are 
transgenic animals. 

The alteration or genetic information may be foreign 
to the species of animal to which the recipient belongs, 
or foreign only to the particular individual recipient, or 
may be genetic information already possessed by the 
recipient. In the last case, the altered or introduced 
gene may be expressed differently than the native gene. 

The altered MOAT gene generally should not fully 
encode the same MOAT protein native to the host animal and 
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its expression product should be altered to a minor or 
great degree, or absent altogether. However, it is 
conceivable that a more modestly modified MOAT gene will 
fall within the compass of the present invention if it is 
a specific alteration. 

The DNA used for altering a target gene may be 
obtained by a wide variety of techniques that include, but 
are not limited to, isolation from genomic sources, 
preparation of cDNAs from isolated mRNA templates, direct 
synthesis, or a combination thereof. 

A preferred type of target cell for transgene 
introduction is the embryonal stem cell (ES) . ES cells 
may be obtained from pre-implantation embryos cultured in 
vitro. Transgenes can be efficiently introduced into the 
ES cells by standard techniques such as DNA transfection 
or by retrovirus -mediated transduction. The resultant 
transformed ES cells can thereafter be combined with 
blastocysts from a non-human animal . The introduced ES 
cells thereafter colonize the embryo and contribute to the 
germ line of the resulting chimeric animal. 

One approach to the problem of determining the 
contributions of individual genes and their expression 
products is to use isolated MOAT genes to selectively 
inactivate the wild-type gene in totipotent ES cells (such 
as those described above) and then generate transgenic 
mice. The use of gene- targeted ES cells in the generation 
of gene-targeted transgenic mice is known in the art. 

Techniques are available to inactivate or alter any 
genetic region to a mutation desired by using targeted 
homologous recombination to insert specific changes into 
chromosomal alleles. However, in comparison with 
homologous extrachromosomal recombination, which occurs at 
a frequency approaching 100%, homologous plasmid- 
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chromosome recombination was originally reported to only 
be detected at frequencies between 10" 6 and 10" 3 . 
Nonhomologous plasmid- chromosome interactions are more 
frequent occurring at levels 10 5 -fold to 10 2 -fold greater 
than comparable homologous insertion. 

To overcome this low proportion of targeted 
recombination in murine ES cells, various strategies have 
been developed to detect or select rare homologous 
recombinants. One approach for detecting homologous 
alteration events uses the polymerase chain reaction (PCR) 
to screen pools of transformant cells for homologous 
insertion, followed by screening of individual clones. 
Alternatively, a positive genetic selection approach has 
been developed in which a marker gene is constructed which 
will only be active if homologous insertion occurs, 
allowing these recombinants to be selected directly. One 
of the most powerful approaches developed for selecting 
homologous recombinants is the positive-negative selection 
(PNS) method developed for genes for which no direct 
selection of the alteration exists. The PNS method is 
more efficient for targeting genes which are not expressed 
at high levels because the marker gene has its own 
promoter. Non- homologous recombinants are selected 
against by using the Herpes Simplex virus thymidine kinase 
(HSV-TK) gene and selecting against its nonhomologous 
insertion with effective herpes drugs such as gancyclovir 
(GANC) or (1- (2-deoxy-2-fluoro-B-D arabinof luranosyl ) -5- 
iodouracil, (FIAU) . By this counter selection, the number 
of homologous recombinants in the surviving transf ormants 
can be increased. 

As used herein, a "targeted gene" or "knock-out" is a 
DNA sequence introduced into the germline or a non-human 
animal by way of human intervention, including but not 
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limited to, the methods described herein. The targeted 
genes of the invention include DNA sequences which are 
designed to specifically alter cognate endogenous alleles. 

Methods of use for the transgenic mice of the 
invention are also provided herein. Knockout mice of the 
invention can be injected with tumor cells or treated with 
carcinogens to generate carcinomas. Such mice provide a 
biological system for assessing chemotherapy resistance as 
modulated by a MOAT gene of the invention. Accordingly, 
therapeutic agents which inhibit the action of these 
transporters and thereby prevent efflux of beneficial 
chemotherapeutic agents from tumor cells may be screened 
in studies using MOAT knock out mice. 

As described above, MOAT-encoding nucleic acids are 
also used to advantage to produce large quantities of 
substantially pure MOAT proteins, or selected portions 
thereof . 

B. MOAT Proteins and Antibodies 

Purified full length MOAT proteins, or fragments 
thereof, may be used to produce polyclonal or monoclonal 
antibodies which also may serve as sensitive detection 
reagents for the presence and accumulation of MOAT 
proteins (or complexes containing MOAT proteins) in 
mammalian cells. Recombinant techniques enable expression 
of fusion proteins containing part or all of MOAT 
proteins. The full length proteins or fragments of the 
proteins may be used to advantage to generate an array of 
monoclonal antibodies specific for various epitopes of 
MOAT proteins, thereby providing even greater sensitivity 
for detection of MOAT proteins in cells. 

Polyclonal or monoclonal antibodies 
immunologically specific for MOAT proteins may be used in 
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a variety of assays designed to detect and quantitate the 
proteins. Such assays include, but are not limited to: 
(1) flow cytometric analysis; (2) immunochemical 
localization of MOAT proteins in tumor cells; and (3) 
immunoblot analysis (e.g., dot blot, Western blot) of 
extracts from various cells. Additionally, as described 
above, anti-MOAT antibodies can be used for purification 
of MOAT proteins and any associated subunits (e.g., 
affinity column purification, immunoprecipitation) . 

From the foregoing discussion, it can be seen that 
MOAT-encoding nucleic acids, MOAT expressing vectors, MOAT 
proteins and anti-MOAT antibodies of the invention can be 
used to detect MOAT gene expression and alter MOAT protein 
accumulation for purposes of assessing the genetic and 
protein interactions involved in the development of drug 
resistance in tumor cells. 

C. Methods and Kits Employing the 

Compositi ons of the Present Invention 

From the foregoing discussion, it can be seen 
that MOAT-encoding nucleic acids, MOAT -expressing vectors, 
MOAT proteins and anti-MOAT antibodies of the invention 
can be used to detect MOAT gene expression and alter MOAT 
protein accumulation for purposes of assessing the genetic 
and protein interactions giving rise to chemotherapy 
resistance in tumor cells. 

Exemplary approaches for detecting MOAT nucleic acid 
or polypeptides/proteins include: 

a) comparing the sequence of nucleic acid in the 
sample with the MOAT nucleic acid sequence to determine 
whether the sample from the patient contains mutations ; or 

b) determining the presence, in a sample from a 
patient, of the polypeptide encoded by the MOAT gene and, 

31 



7/15/2008, EAST Version: 2.2.1.0 



WO 99/49735 PCT/US99/06644 

if present, determining whether the polypeptide is full 
length, and/or is mutated, and/or is expressed at the 
normal level ; or 

c) using DNA restriction mapping to compare the 
restriction pattern produced when a restriction enzyme 
cuts a sample of nucleic acid from the patient with the 
restriction pattern obtained from normal MOAT gene or from 
known mutations thereof; or, 

d) using a specific binding member capable of binding 
to a MOAT nucleic acid sequence (either normal sequence or 
known mutated sequence) , the specific binding member 
comprising nucleic acid hybridizable with the MOAT 
sequence, or substances comprising an antibody domain with 
specificity for a native or mutated MOAT nucleic acid 
sequence or the polypeptide encoded by it, the specific 
binding member being labelled so that binding of the 
specific binding member to its binding partner is 
detectable; or, 

e) using PCR involving one or more primers based on 
normal or mutated MOAT gene sequence to screen for normal 
or mutant MOAT gene in a sample from a patient. 

A "specific binding pair" comprises a specific 
binding member (sbm) and a binding partner (bp) which have 
a particular specificity for each other and which in 
normal conditions bind to each other in preference to 
other molecules. Examples of specific binding pairs are 
antigens and antibodies, ligands and receptors and 
complementary nucleotide sequences. The skilled person is 
aware of many other examples and they do not need to be 
listed here. Further, the term "specific binding pair" is 
also applicable where either or both of the specific 
binding member and the binding partner comprise a part of 
a large molecule. In embodiments in which the specific 
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binding pair are nucleic acid sequences, they will be of a 
length to hybridize to each other under conditions of the 
assay, preferably greater than 10 nucleotides long, more 
preferably greater than 15 or 20 nucleotides long. 

In most embodiments for screening for alleles giving 
rise to chemotherapy resistance, the MOAT nucleic acid in 
biological sample will initially be amplified, e.g. using 
PCR, to increase the amount of the analyte as compared to 
other sequences present in the sample. This allows the 
target sequences to be detected with a high degree of 
sensitivity if they are present in the sample. This 
initial step may be avoided by using highly sensitive 
array techniques that are becoming increasingly important 
in the art. 

The identification of the MOAT gene and its 
association with a particular chemotherapy resistance 
paves the way for aspects of the present invention to 
provide the use of materials and methods, such as are 
disclosed and discussed above, for establishing the 
presence or absence in a test sample of a variant form of 
the gene, in particular an allele or variant specifically 
associated with chemotherapy resistance. This may be done 
to assess the propensity of the tumor to exhibit 
chemotherapy resistance. 

In still further embodiments, the present invention 
concerns immunodetection methods for binding, purifying, 
removing, quantifying or otherwise generally detecting 
biological components. The encoded proteins or peptides of 
the present invention may be employed to detect antibodies 
having reactivity therewith, or, alternatively, antibodies 
prepared in accordance with the present invention, may be 
employed to detect the encoded proteins or peptides. The 
steps of various useful immunodetection methods have been 
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described in the scientific literature, such as, e.g., 
Nakamura et al . (1987). 

In general, the immunobinding methods include 
obtaining a sample suspected of containing a protein, 
peptide or antibody, and contacting the sample with an 
antibody or protein or peptide in accordance with the 
present invention, as the case may be, under conditions 
effective to allow the formation of immunocomplexes . 

The immunobinding methods include methods for 
detecting or quantifying the amount of a reactive 
component in a sample, which methods require the detection 
or quantitation of any immune complexes formed during the 
binding process. Here, one would obtain a sample 
suspected of containing a MOAT gene encoded protein, 
peptide or a corresponding antibody, and contact the 
sample with an antibody or encoded protein or peptide, as 
the case may be, and then detect or quantify the amount of 
immune complexes formed under the specific conditions. 

In terms of antigen detection, the biological sample 
analyzed may be any sample that is suspected of containing 
the MOAT antigen, such as a tumor tissue section or 
specimen, a homogenized tissue extract, an isolated cell, 
a cell membrane preparation, separated or purified forms 
of any of the above protein-containing compositions. 

Contacting the chosen biological sample with the 
protein, peptide or antibody under conditions effective 
and for a period of time sufficient to allow the formation 
of immune complexes (primary immune complexes) is 
generally a matter of simply adding the composition to the 
sample and incubating the mixture for a period of time 
long enough for the antibodies to form immune complexes 
with, i.e., to bind to, any antigens present. After this 
time, the sample -antibody composition, such as a tissue 
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section, ELISA plate, dot blot or Western blot, will 
generally be washed to remove any non-specif ically bound 
antibody species, allowing only those antibodies 
specifically bound within the primary immune complexes to 
be detected. 

In general, the detection of immuno complex formation 
is well known in the art and may be achieved through the 
application of numerous approaches. These methods are 
generally based upon the detection of a label or marker, 
such as any radioactive, fluorescent, biological or 
enzymatic tags or labels of standard use in the art. U.S. 
Patents concerning the use of such labels include U.S. 
Pat. Nos. 3,817,837; 3,850,752; 3,939,350; 3,996,345; 
4,277,437; 4,275,149 and 4,366,241, each incorporated 
herein by reference. Of course, one may find additional 
advantages through the use of a secondary binding ligand 
such as a second antibody or a biotin/avidin ligand 
binding arrangement, as is known in the art. 

In one broad aspect, the present invention 
encompasses kits for use in detecting expression of MOAT 
encoding nucleic acids in biological samples, including 
biopsy samples. Such a kit may comprise one or more pairs 
of primers for amplifying nucleic acids corresponding to 
the MOAT gene. The kit may further comprise samples of 
total mRNA derived from tissues expressing at least one or 
a subset of the MOAT genes of the invention, to be used as 
controls. The kit may also comprise buffers, nucleotide 
bases, and other compositions to be used in hybridization 
and/or amplification reactions. Each solution or 
composition may be contained in a vial or bottle and all 
vials held in close confinement in a box for commercial 
sale. In a further embodiment, the invention encompasses 
a kit for use in detecting MOAT proteins in chemotherapy 
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resistant cancer cells comprising antibodies specific for 
MOAT proteins encoded by the MOAT nucleic acids of the 
present invention. 

Another aspect of the present invention comprises 
screening methods employing host cells expressing one or 
more MOAT genes of the invention. An advantage of having 
discovered the complete coding sequenced of MOAT B-E 
is that cell lines that overexpress MOATB C D or E can be 
generated using standard transfection protocols. Cells 
that overexpress the complete cDNA will also harbor the 
complete proteins, a feature that is essential for 
biological activity of proteins. The overexpressing cell 
lines will be useful in several ways: l)The drug 
sensitivity of overexpressing cell lines can be tested 
with a variety of known anticancer agents in order to 
determine the spectrum of anticancer agents for which the 
transporter confers resistance; 2) The drug sensitivity of 
overexpressing cell lines can be used to 
determine whether newly discovered anticancer agents are 
transported out of the cell by one of the discovered 
transporters; 3 ) Overexpressing cell lines can be used to 
identify potential inhibitors that reduce the activity of 
the transporters. Such inhibitors are of great 
clinical interest in that they may enhance the activity of 
known anticancer agents, thereby increasing their 
effectiveness. Reduced activity will be detected by 
restoration of anticancer drug sensitivity, or by 
reduction of transporter mediated cellular efflux of 
anticancer agents. In vitro biochemical studies designed 
to identify reduced transporter activity in 
the presence of potential inhibitors can also be performed 
using membranes prepared from overexpessing cell lines; 
and 4) Overexpressing cell lines can also be used to 
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determine whether pharmaceutical agents that are not 
anticancer agents are transported out of the cell by the 
transporters . 



The following protocols are provided to facilitate 
the practice of the present invention. 

Isolation of MOAT-B CDNA 

Forward {CT(A/G/T) GT (A/G/T) GC(A/G/T) GT (A/G/T) 
GT(A/G/T) GG(A/G/C/T) } (SEQ ID NO : 9 ) and reverse {(G/A)CT 
(A/G/C/T)A(A/G/C) (A/G/C/T)GC (A/G/C/T) (G/C) (T/A) 
(A/G/C7T)A(A/G) (A/G/C/T)GG (A/G/C/T)TC (A/G) TC} (SEQ ID 
NO: 16) degenerate oligonucleotide primers were designed 
based upon the first nucleotide binding folds of human 
MRP, CFTR, and MDR1 . Bacteriophage DNA isolated from a 
C200 cDNA library prepared in the ApCEV27 phagemid 
vector (17) was used as template in PCR reactions 
containing 250 ng cDNA, 5 uM primers, 5 0 mM KC1, 10 mM 
Tris-HCl, pH 8.3, 3 mM MgCl 2 , .05% gelatin, 0 . 2 mM dNTP 
and Taq polymerase (Perkin Elmer Cetus) . Five cycles of 
PCR were performed as follows: 94°C for 1 minute, 40°C 
for 2 minutes, 72°C for 3 minutes. Twenty five cycles 
were then performed as follows: 94°C for 1 minute, 55°C 
for 1 minute, and 72°C for 1 minute. The resulting 
reaction products were used as template in a second 
round of PCR, as described above, with nested forward 
{ CGGGATCC AG (A/G) GA (A/G) AA(C/T) AT(A/C/T) CT(A/G/C/T) 
TTT GG(A/G/C/T) } (SEQ ID NO:17) and reverse {CGGAATTC 
(A/G/T/C)TC (A/G)TC (A/C/T)AG (A/G/C/T)AG (A/G) TA 
( A/T/G) AT (A/G)TC}(SEQ IDNO:18) degenerate 
oligonucleotide primers . PCR reaction products were 
isolated from an agarose gel and subcloned into the 
BamHI and EcoRI sites of pBluescript (Stratagene) . 
Nucleotide sequence analysis 
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was performed on plasmid DNA prepared from ampicillin 
resistant transf ormants . Additional cDNA clones were 
isolated from C200 (ovary) and B5 (breast) cDNA libraries 
by plaque hybridization using the PCR product as the 
initial radiolabeled probe. 

RNA Blot Analysis 

Blots containing polyA* RNA isolated from human 
tissues (Clontech) were prehybridized at 45°C for 8 hours 
in 50% formamide, 4X SSC, 4X Denhardt ' s solution, 0.04 M 
sodium phosphate monobasic, pH 6 . 5 , 0.8% (w/v) glycine, 
0.1 mg/ml sheared denatured salmon sperm DNA. 
Hybridization was performed at 45°C with 32 P-labeled MOAT-B 
or GAPDH probes in a solution containing 50% formamide, 3X 
SSC, 0.04 M sodium phosphate pH 6.5, 10% dextran sulfate, 
0.1 mg/ml sheared denatured salmon sperm DNA. Blots were 
washed 2 times for 15 min at 65°C in 2X SSC, 5 mM Tris-HCl 
pH7.4, 0.5% SDS, 2.5 mM EDTA, 0.1% sodium pyrophosphate pH 
8.0, and subsequently washed 2 times for 15 min in 0 . IX 
SSC. Blots were then subjected to autoradiography. 

Chromosomal localization 

Preparation of metaphase spreads from 
phytohemagglutinin- stimulated lymphocytes of a healthy 
female donor, and fluorescence in situ hybridization and 
detection of immunofluorescence were carried out as 
previously described (18). A 2.2-kb cDNA clone of MOAT-B 
inserted in pBluescript was biotinylated by nick 
translation in a reaction containing 1 (xg DNA, 20 fiM each 
of dATP , dCTP and dGTP, 1 M M dTTP, 25 mM Tris-HCl, pH 7.5, 
5 mM MgCl 2 , 10 mM S-mercaptoethanol , lOyzM biotin- 1 6 -dUTP 
(Boehringer Mannheim) , 2 units DNA polymerase l/DNase 1 
(GIBCO, BRL) and water to a total volume of 50 fil . The 
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probe was denatured and hybridized to metaphase spreads 
overnight at 37°C. Hybridization sites were detected with 
f luorescein-labeled avidin (Oncor) and amplified by 
addition of anti-avidin antibody (Oncor) and a second 
layer of fluorescein- labeled avidin. The chromosome 
preparations were counterstained with DAPI and observed 
with a Zeiss Axiophot epif luorescence microscope equipped 
with a cooled charge coupled device camera (Photometries, 
Tucson AZ) operated by a Macintosh computer work station. 
Digitized images of DAPI staining and fluorescein signals 
were captured, pseudo-colored and merged using Oncor Image 
version 1.6 software. 

Isolation of MOAT-C and MOAT - D cDNA 

MOAT-C and MOAT - D cDNA clones were isolated by plaque 
hybridization from bacteriophage cDNA libraries using the 
I.M.A.G.E. clones as the initial probes (ATCC) . 

RNA blot analysis 

Blots containing polyA* RNA isolated from human 
tissues (Clontech) were purchased from Clontech, and 
hybridized with radiolabeled MOAT-C, MOAT-D or actin 
probes according to the manufacturer's directions. 

Chromosomal localization 

Preparation of metaphase spreads from 
phytohemagglutinin- stimulated lymphocytes of a healthy 
female donor, and fluorescence in situ hybridization and 
detection of immunofluorescence were carried out as 
previously described (18) . A MOAT-C probe inserted in 
pBluescript, or MOAT-D probe inserted in pBluescript, was 
biotinylated by nick translation in a reaction containing 
1 ng DNA, 2 0 fiM each of dATP, dCTP and dGTP, 1 fiM dTTP, 25 
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mM Tris-HCl, pH 7.5, 5 mM MgCl 2 , 10 mM 6-mercaptoethanol , 
10/zM biotin-16-dUTP (Boehringer Mannheim), 2 units DNA 
polymerase l/DNase 1 (GIBCO, BRL) and water to a total 
volume of 50 fil . The probe was denatured and hybridized 
to metaphase spreads overnight at 37°C. Hybridization 
sites were detected with f luorescein-labeled avidin 
(Oncor) and amplified by addition of anti-avidin antibody 
(Oncor) and a second layer of f luorescein-labeled avidin. 
The chromosome preparations were counterstained with DAPI 
and observed with a Zeiss Axiophot epif luorescence 
microscope equipped with a cooled charge coupled device 
camera (Photometries, Tucson AZ) operated by a Macintosh 
computer work station. Digitized images of DAPI staining 
and fluorescein signals were captured, pseudo-colored and 
merged using Oncor Image version 1.6 software. 

The following examples are provided to illustrate 
various embodiments of the invention. They are not 
intended to limit the invention in any way. 

EXAMPLE I 
Isolation of MOAT - B cDNA. 

A degenerate PCR approach was used to isolate 
MRP-related transporters. Degenerate oligonucleotide 
primers were prepared based upon the N-terminal nucleotide 
binding folds of MRP and other eukaryotic transporters, 
and used in conjunction with DNA prepared from an ovarian 
cancer cell line bacteriophage library. Nucleotide 
sequence analysis of one of the resulting PCR products 
indicated that it encoded a segment of a novel nucleotide 
binding fold that was most closely related to MRP and 
cMOAT . Overlapping cDNA clones were isolated from ovarian 
and breast bacteriophage libraries by plaque hybridization 
using the PCR product as the initial probe. A total of 
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5.9 kB of cDNA was isolated. Nucleotide sequence analysis 
revealed two classes of cDNA clones that were about 
equally represented among isolates from each of the two 
bacteriophage libraries. The first class contained an 
open reading frame of 3975 bp that was bordered by in 
frame stop codons located at positions -76 and -42 
(relative to the putative initiation codon) and 3 976, and 
encoding a predicted protein of 132 5 amino acids, which is 
designated MOAT-B. The open reading frame was followed by 
approximately 2 kB of 3 1 untranslated sequences. The most 
upstream ATG in the open reading frame was located in the 
sequence context ""CAAGATGC* 4 . The A at position -3 of the 
putative translation initiation codon was in agreement 
with the major feature of the Kozak consensus sequence, 
but the C at position +4 was divergent from the more usual 
G. The second class of cDNA clones was identical to the 
first with the exception of a single nucleotide. These 
clones harbored an additional T following nucleotide 3872 
of the first class of clones, close to the C- terminus of 
the predicted protein. This additional nucleotide 
resulted in a frame shift such that the predicted protein 
of the second class of cDNA clones was 22 residues shorter 
than that of the first class of cDNA clones, and in which 
the C-terminal 34 residues of the latter reading frame 
were replaced by 12 distinct residues. See brief 
description of Figure 1. 

Analysis of the MOAT-B Predicted Structure. 

Comparison of the MOAT-B predicted protein with 
complete coding sequences in protein data bases using the 
BLAST program indicated that it shared significant 
similarity with several eukaryotic ABC transporters. 
Table I. 
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Table I. Comparison of peptide domains of MOAT-B with 
those of other eukaryotic ABC transporters 



MOAT-B 
Domain 
(peptide) 



NBF1 linker TM2 

(428-576) (577-705) (706-992) 



percent identity 



NBF2 C- overall 

terminus identity 
(1058- (1217- 
1216) 1325) 



57.2 
55.3 



39.2 
38.9 



28.1 
17.6 



42.8 
40.3 



32.9 
23.3 



The indicated domains are, TM1 : segment containing the 
transmembrane spanning domain N-terminal to NBF1; NBF1 and NBF2 : 
nucleotide binding folds 1 and 2; Linker region: segment located 
between NBF1 and TM2 ; TM2 : segment containing the transmembrane spanning 
domain located between the two NBFs; C-terminus: segment between NBF2 
and the C-terminus of the proteins. Sequence alignments were generated 
using the PILEUP program of the GCC package. Percent amino acid 
identity with MOAT-B domains are shown. 



Typical features of eukaryotic ABC transporters were 
present in the predicted MOAT-B protein. See Figure 1. 
Overall the protein was composed of a tandem repeat of a 
nucleotide binding fold appended C- terminal to a 
hydrophobic domain that contained several potential 
transmembrane spanning helices . Conserved Walker A and B 
ATP binding sites were present in each of the nucleotide 
binding folds. See Figure 2A. In addition, a conserved C 
motif, the signature sequence of ABC transporters, was 
present in each nucleotide binding fold. Analysis of 
potential transmembrane motifs using the TMAP program (19) 
and an input sequence alignment of MOAT-B and MOAT-C, a 
transporter highly related to MOAT-B 4 , predicted 12 
transmembrane helices with 6 transmembrane segments in 
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each of the two hydrophobic domains. This 6 + 6 
configuration of predicted transmembrane helices is in 
agreement with topological models proposed for MRP and 
other ABC transporters (20, 21), and is shown in Figure 1. 
However, alternative predictions of transmembrane segments 
were obtained using different program parameters or input 
sequence alignments. For example, when the TMAP program 
was used with an input sequence alignment consisting of 
human MRP, rat cMOAT , rat sulfonyl urea receptor (SUR) , 
human cystic fibrosis conductance regulator (CFTR) and 
human P-glycoprotein, a 6+5 configuration was 
predicted. The only substantial difference between the 
latter prediction and the structure shown in Figure 1 is 
that transmembrane segments 9 (829-853) and 10 (855-878) 
were replaced by a single predicted transmembrane segment 
spanning amino acids 847 - 875. 

Among ABC transporters, the degree of similarity of 
the nucleotide binding folds is considered to be the best 
indicator of functional conservation. Comparison of the 
nucleotide binding folds of MOAT-B with other eukaryotic 
ABC transporters indicated that it was most closely 
related to MRP, the yeast cadmium resistance protein 
(YCF1) and cMOAT (Table I), three transporters that have 
organic anions as substrates. The MOAT-B NBF1 was 55.6, 
56.0 and 53.3 percent identical, and the MOAT-B NBF2 was 
61.6, 57.2 and 55.3 percent identical to the first and 
second nucleotide binding folds of human MRP, YCF1 and 
human cMOAT, respectively. Aside from the latter 
transporters, the MOAT-B nucleotide binding folds were 
most closely related to those of CFTR and SUR. The MOAT-B 
nucleotide binding folds shared significantly less 
similarity with those of MDR1 . Alignment of the MOAT-B 
nucleotide binding folds with those of other eukaryotic 
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transporters is shown in Figure 2A. Analysis of the 
overall amino acid identity of MOAT - B with other ABC 
transporters also indicated that it was most closely 
related to MRP, YCF1 and cMOAT (Table I) . Overall MOAT-B 
was 39.2, 38.9 and 38 percent identical to these 
transporters, respectively. Figure 2B shows a comparison 
of the hydropathy profiles of MOAT-B with those of other 
eukaryotic transporters. This comparison reveals that 
MOAT-B (1325 amino acids) is approximately 200 amino acids 
smaller than MRP (1531 residues), cMOAT (1545 residues) 
and YCF1 (1515 residues), and that this size difference is 
largely accounted for by the absence in MOAT-B of an amino 
terminal hydrophobic extension that is present in MRP, 
cMOAT and YCF1 (22) . This N-terminal hydrophobic segment 
is predicted to harbor several transmembrane spanning 
segments, and is also present in SUR. 
Expression Pattern of MOAT-B in Human Tissues. 

To gain insight into the possible function of MOAT-B, 
its expression pattern in a variety of human tissues was 
examined by RNA blot analysis. As shown in Figure 3, a 
MOAT-B transcript of approximately 6 kB was readily 
detected. The isolation of 5.9 kB of MOAT-B cDNA was 
consistent with this size. MOAT-B expression was detected 
in each of the 16 tissues analyzed. Transcript levels were 
highest in prostate and lowest in liver and peripheral 
blood leukocytes, for which prolonged exposure of film 
were required to detect expression. Intermediate levels of 
expression were observed in other tissues. 
Chromosomal Localization of the MOAT-B Gene. 

The MOAT-B chromosomal localization was determined by 
fluorescence in situ hybridization. As shown in Figure 4, 
hybridization of the MOAT-B probe to metaphase spreads 
revealed specific labeling at human chromosome band 13q32. 
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Fluorescent signals were detected on chromosome 13 in each 
of 19 metaphase spreads scored. Of 13 5 signals observed, 
62 (46%) were on 13q. Among these signals, 61 localized 
at 13q32, near the boundary between 13q31 and 13q32 . 
Paired (on sister chromatids) signals were only seen at 
band 13q32 . In several metaphases, signals on a single 
chromatid were observed at chromosome bands 6p21 or 4q21, 
suggesting hybridization to distantly related sequences. 

EXAMPLE II 
Isolation of MOAT-C and MOAT-D cDNA. 

Isolation of the MOAT-B 4 transporter as described 
above suggested the possibility that there were other 
MRP/cMOAT-related transporters. A blast search (36) of 
the nonredundent expressed sequence tag data base using 
MRP and related yeast transporters revealed two clones 
with significant similarity to MRP and cMOAT. The first 
of these sequences (I.M.A.G.E. consortium clone 113196) 
was 1.2 kb in length, 800 bp of which encoded an 
MRP-related peptide. A segment of this clone was used as 
a probe to screen ovarian and hematopoietic bacteriophage 
libraries. Analysis of these cDNA clones indicated that 
they contained approximately 2 kb of additional coding 
sequence not present in clone 113196. An additional 1655 
bp of 5 ' sequence was obtained by several rounds of RACE 
using the bacteriophage DNA prepared from the ovarian cDNA 
library as template. The continuity of the sequences 
obtained by RACE with the cDNA clones isolated from 
bacteriophage libraries was confirmed by nucleotide 
sequence analysis of a 2 kb product obtained by RT/PCR 
using an upstream oligonucleotide primer located at the 5 ' 
end of the RACE sequence and a downstream primer located 
at the 5 ' end of the cDNA obtained by plaque 
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hybridization. A total of approximately 5.9 kb of cDNA 
sequences were isolated. Nucleotide sequence analysis 
revealed an open reading frame of 4311 bp that was 
preceded by an in frame stop codon located at positions 
-93 (relative to the putative initiation codon) , and 
encoding a predicted protein of 143 7 amino acids, which is 
designated MOAT-C herein. The open reading frame was 
followed by approximately 1.4 kB of 3' untranslated 
sequences in which a polyadenylation sequence (AAUAAA) was 
located 2 0 bp upstream of the poly (A) tail. The most 
upstream ATG in the open reading frame was located in the 
sequence context ~ 4 GAAGATGA* 4 . The A at position -3 of the 
putative translation initiation codon was in agreement 
with the major feature of the Kozak consensus sequence, 
but the A at position +4 was divergent from the more usual 
G (37) . The second sequence identified in our data base 
search (I.M.A.G.E. consortium clone 208097) was 1.2 kb in 
length, of which 588 bp encoded an MRP-related peptide. A 
segment of this clone was used as a probe to screen liver 
and monocyte bacteriophage cDNA libraries, and 5' cDNA 
segments of the isolated cDNA clones were used in a 
subsequent round of screening. Together approximately 5.2 
kb of cDNA sequence were isolated. Nucleotide sequence 
analysis revealed an open reading frame of 4570 bp, which 
is designated MOAT-D herein. The open reading frame was 
followed by approximately 0.6 kb of 3' untranslated 
sequences in which a polyadenylation sequence (AAUAAA) was 
located 12 bp upstream of the poly (A) tail. An upstream 
in frame stop codon was not present in the MOAT-D cDNA 
clones, and attempts to obtain additional upstream 
sequences by RACE using as template cDNA prepared from 
sources in which MOAT-D is abundant were not successful. 
The most upstream ATG in the open reading frame 
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(nucleotide position 5-7) , located in the sequence context 
""ATGGATGG* 4 , was therefore designated as the translational 
initiation site. The G at position +4, was in good 
agreement with the Kozak consensus sequence, but the T at 
-3 was divergent from the more usual A (3 7) . Although an 
upstream in frame stop codon was not identified in the 
MOAT-D cDNA clones, the size of the encoded protein was 
within one amino acid of the size of the transporter with 
which it shares the highest degree of identity (MRP) , 
suggesting that the complete MOAT-D open reading frame was 
present in the isolated cDNA clones. 



Analysis of the MOAT-C and MOAT-D Predicted Proteins. 

Comparison of the MOAT-C and MOAT-D predicted 
proteins with complete coding sequences in protein data 
bases using the BLAST program indicated that they shared 
significant similarity with several eukaryotic ABC 
transporters. Typical features of eukaryotic ABC 
transporters were present in the predicted proteins. See 
Figure 5. Overall the proteins were composed of 
hydrophobic domains containing potential transmembrane 
spanning helices and two nucleotide binding folds. 
Conserved Walker A and B ATP binding sites, as well as a 
conserved C motif, the signature sequence of ABC 
transporters, was present in the nucleotide binding folds. 
Computer assisted analysis of potential transmembrane 
helices of MOAT-C using the TMAP program (19) predicted 12 
transmembrane helices with 6 transmembrane spanning 
helices in each of two membrane spanning domains. This 6 
+ 6 (TM1-TM6 and TM7-TM12) configuration of predicted 
transmembrane helices is in agreement with topological 
models proposed for several other ABC transporters (20, 
21), and is shown in Figure 5. However, alternative 
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predictions of transmembrane segments were obtained using 
different program parameters or input sequence alignments. 
Comparison of the hydropathy profiles of MOAT-C with other 
MRP/cMOAT-related transporters (Fig. 6B) indicates that 
its structure is similar to that of MOAT-B, which also has 
two membrane spanning domains. 

In contrast to MOAT-C, hydrophobicity analysis of 
MOAT-D indicated that it has three membrane spanning 
domains. Similar to MRP, cMOAT and the yeast cadmium 
resistance factor 1 (YCF1) , MOAT-D has an additional 
N- terminal hydrophobic domain that is not present in 
MOAT-B or MOAT-C (Figs. 5 and 6). A 5+6+6 configuration 
of transmembrane spanning helices has been proposed for 
MRP (38 ) , in which the N- terminal extension harbors 5 
transmembrane spanning helices, and 6 transmembrane 
helices are present in the second and third membrane 
spanning domain. An alignment of the MOAT-D predicted 
protein with MRP using the GAP program indicated that 
proposed MRP transmembrane spanning helices were conserved 
in MOAT-D. This 5+6+6 model for MOAT-D is shown in Fig. 5. 
Another configuration of transmembrane spanning helices 
(5+6+4) was predicted using computer assisted analysis. 
MRP has been reported to have two N-linked glycosylation 
sites in its N-terminus (Asn-19 and Asn-23) and another 
site located between the first and second transmembrane 
spanning helix of its third membrane spanning domain 
(Asn-1006) . The alignment of MOAT-D with MRP indicated 
that an N-terminal (Asn-21) and a distal N-glycosylation 
sites (Asn-1008/1009) were conserved in analogous 
positions in MOAT-D. Only the distal N-glycosylation site 
of MRP is conserved in MOAT-C (Asn890) (Fig. 5) and MOAT-B" 
(Asn746/754) . 

Among ABC transporters, the degree of similarity of 
48 



7/15/2008, EAST Version: 2.2.1.0 



WO 99/49735 PCT/US99/06644 

the nucleotide binding folds is considered to be the best 
indicator of functional conservation. Comparison of the 
nucleotide binding folds of MOAT-C and MOAT-D with other 
eukaryotic ABC transporters indicated that they were most 
closely related to those of human MRP, human cMOAT and 
yeast YCFI, three transporters that have organic anions as 
substrates. As shown in Table 2, among the human 
transporters, the MOAT-C NBF1 was about equally related to 
MOAT-D, MRP and cMOAT (55-61% identity), and less similar 
to MOAT-B (49% identity) . 

Table II. Amino acid identity: nucleotide binding folds 1 
and 2 of MRP/cMOAT sub-family members. 

MOAT-C MOAT-D MOAT-B MRP cMOAT YCFI 

% IDENTIFY (BNF1/NBF20) 

MOAT-C 57.3/58.9 49.3/59.1 60.0/59.4 61.3/60.6 55.3/58/8 

MOAT-D 57.3/58/9 55.3/54.1 70.173.8 67.3/70.0 52.7/61.3 

MOAT-B 49.3/59.1 55.3/54.1 57.3/61/6 53.3/55.3 56.0/57.2 

MRP 60.0/59.4 70.7/73.7 57.3/61.6 66.0/73.1 53.3/63.8 

CMOAT 61/3/60.6 67.3/70.0 53.3/55.3 66.0/73.1 50.7/61/3 

YCFI 55.3/58.8 52.7/61.3 56.0/57.2 53.3/63.8 50.7/61.3 



The MOAT-C NBF2 shared about equal amino acid identity 
with the five other transporters in this group (59-61% 
identity) . Overall, the MOAT-C protein was about equally 
related to the other five transporters in this group, with 
33.1-3 6.5% identity. Aside from these 
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transporters, MOAT-C is most closely related to CFTR, with 
which its NBFs shared 44%/42 % identity, and SUR, with 
which its NBFs shared 49%/51% identity. 

The MOAT-D NBFs were clearly most closely related to 
those of MRP and cMOAT , with which they shared considerable 
amino acid identity (67.3-73.8%). See Table III. Of the 
latter two transporters, the MOAT-D NBFs were slightly more 
related to those of MRP. In contrast, the MOAT-D NBFs 
shared only 55.3-58.9% identity with those of MOAT-C and 
MOAT-B. Overall, MOAT-D was again most closely related to 
MRP (57.3%) and cMOAT (46.9%), but significantly more 
related to MRP. Consistent with the analysis of NBFs, 
MOAT-D was much less related to MOAT-C and MOAT-B, with 
which it shared only 33.1% and 35.3% identity, 
respectively. Alignment of the MOAT-C and MOAT-D nucleotide 
binding folds with those of other eukaryotic transporters 
is shown in Fig. 6. 

Table III. Overall amino acid identifying among MRP/cMOAT 
sub- family members 

MQAT-C MOAT-D MOAT-B MRP cMOAT YCF1 



%identity 



MOAT-C 






33 


.1 


36. 


.5 


35 


.8 


36. 


.2 


33 


.6 


MOAT-D 


33 


.1 






35. 


.3 


57. 


.3 


46, 


.9 


38. 


.1 


MOAT-B 


36. 


.4 


35 


.3 






39. 


,4 


36. 


.8 


38. 


.8 


MRP 


35. 


.8 


57. 


.3 


39 , 


.4 






48. 


.4 


46. 


.4 


cMOAT 


36. 


,3 


46, 


,9 


36. 
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48 . 


.8 






38. 


,8 


YCF1 


33. 


.6 


38. 


.1 


38. 


8 


40. 


4 


38. 


8 







Expression Pattern of MOAT-C and MOAT-D in Human Tissues. 

To gain insight into the possible functions of MOAT-C 
and MOAT-D, their expression patterns in a variety of human 
tissues was examined by RNA blot analysis. As 
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shown in Fig. 7 (upper panels), a MOAT-C transcript of 
approximately 6.6 kB was readily detected in several 
tissues. MOAT-C transcript levels were highest in 
skeletal muscle, with intermediate levels in kidney, 
testes, heart and brain. Low levels were detected in most 
other tissues, including spleen, thymus, prostate, ovary, 
and placenta. Prolonged exposures were required for 
detection in lung and liver. MOAT-D was expressed as an 
approximately 6 kb transcript (middle panels) . Compared 
to MOAT-C, the MOAT-D expression pattern was more 
restricted. MOAT-D was highly expressed in colon and 
pancreas, with lower levels in liver and kidney. Low 
levels were detected in small intestine, placenta and 
prostate. Prolonged exposures were required to detect 
MOAT- D in testes, thymus, spleen and lung. 

Chromosomal localization of the MOAT-C and MOAT-D genes. 

The MOAT-C and MOAT-D chromosomal localizations 
were determined by fluorescence in situ hybridization. As 
shown in Figure 8, hybridization of the MOAT-C probe to 
metaphase spreads revealed specific labeling at human 
chromosome band 3q27. Fluorescent signals were detected 
on chromosome 3q in each of 2 2 metaphase spreads scored. 
Of 75 signals observed, 43 (57%) were on 3q. Paired (on 
sister chromatids) signals were only seen at band 3q27. 
Hybridization of the MOAT-D probe revealed specific 
labeling at human chromosome band 17q21.3. Fluorescent 
signals were detected on chromosome 17 in each of 21 
metaphase spreads scored. Of 83 signals observed, 34 
(41%) were on 17q21.3. Paired (on sister chromatids) 
signals were only seen at band 17q21.3. 
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EXAMPLE III 
Isolation of MOAT-E and MOAT - E cDNA. 

Analysis of ara, a reported cDNA sequence that 
encodes a 453 amino acid transporter, revealed that it is 
a non-physiological sequence representing a combination 
of 5' MRP sequences fused to an MRP/cMOAT-related 
transporter. The MRP sequences extend to codon 8 of the 
reported predicted protein. 

To isolate the complete physiological cDNA, a RT/PCR 
approach was employed in which primers were designed 
based upon a reported genomic sequence that encodes exons 
identical to the reported ara sequence. The MOAT-E cDNA 
was isolated in three segments. The first segment, 
spanning residues 1-616, was isolated by PCR using 5' 
primer ATGGCCGCGCCTGCTGAGC ; (SEQ ID NO: 10) and 3' primer 
GTCTACGACACCAGGGTCAA (SEQ ID NO: 11) . The second 
segment, spanning residues 1815-3187, was isolated by PCR 
using 5' CTGCCTGGAAGAAGTTGACC (SEQ ID NO: 12) and 3' 
primer CTGGAATGTCCACGTCAACC (SEQ ID NO : 13) . The third 
segment, spanning residues 3158-1503, was isolated by PCR 
using 5' primer GGAGACAGACACGGTTGACG (SEQ ID NO: 14) and 
3' primer GCAGACCAGGCCTGACTCC (SEQ ID NO : 15). The 
primer were designed based upon the nucleotide sequence 
of human genomic BAC clone CIT987SD-962B4 . The template 
for these reactions was random-primed human kidney cDNA 
prepared from total RNA. Using this approach the 
physiological cDNA was isolated which is designated 
MOAT-E herein and set forth as Sequence I.D. No. 7. 

Analysis of the MOAT-E Predicted Protein. 

MOAT-E encodes a 1503 amino acid transporter. The 
MOAT-E predicted amino acid sequence is designated 
Sequence I.D. No. 8. See Figure 9. Also shown is the 
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location of potential transmembrane helices (overbars) , 
potential N-glycosylation site (black dot) and the two 
nucleotide binding folds (NBF1 and NBF2 ) . Walker A and B 
motifs, as well as the signature C motif of ABC 
transporters are also indicated. Comparison of MOAT-E 
with ara indicates that the ara predicted protein is not 
only a fused sequence, but also that it represents only 
446 (-3 0%) of the 1503 MOAT-E residues. 

Comparison of MOAT-E with the other members of the 
MRP/cMOAT subfamily, which include MRP, cMOAT, MOAT-B, 
MOAT-C and MOAT-E, is shown in Table IV. MOAT-E is 
highly related to MOAT-D, MRP and cMOAT, with which it 
shares 39-45% identity. This high degree of identity is 
also indicated by the high percent identities of the 
nucleotide binding folds, which range from 55-61%. In 
contrast, MOAT-E is less related to MOAT-B and MOAT-C, 
with which it shares -31% and 34% identity, respectively. 

Table IV. Amino acid identity among MRP/cMOAT sub-family 
members. 3 The bold type indicates the percent identity of 
the overall proteins, and the parentheses indicates the 
percent identity of the nucleotide binding 
folds . 

MOAT - E MOAT-B MOAT-C MOAT-D MRP CMOAT 

% identity* 1 

MOAT-E — 33.9 30.6 43.6 45.1 38.9 

(52.0/56.6) (50.0/52.5) (59.3/59.4) (61.3/61.4) (55.3/59.4) 
MOAT-B 33.9 — 3 6.4 35.3 39.4 36.8 

(52.0/56.6) — (49.3/59.1) (55.3/54.1) (57.3/61.6) (56.0/57 2) 

MOAT-C 30.0 36.4 — 33.1 35.8 36.2 

(50.0/52.5) (49.3/59.1) — (57.3/58.9) (60.6/59.4) (61.3/60.6) 

MOAT-D 43.6 35.3 33.1 — 57.3 46 . 9 

(59.3/59.4) (55.3/54.1) (57.3/58.9) — (70 7/73.8) (67.3/70.0) 

MRP 45.1 39.4 35.8 57.3 — 48.4 

(61.3/61.9) (57.3/61.6) (60.0/59.4) (70.7/73.8) — (66.0/73 1) 
CMOAT 38.9 36.8 36.2 46.9 48.4 
(53.1/59.4) (56.0/57.2) (61.3/60.6) (67.3/70.0) (66.0/73.1) 

'overall amino acid identifies are indicated in bold-face, and identities of 
nucleotide binding folds 1 and 2 are indicated in parentheses (NBF1/NBF2) 
percent identity was obtained using the GAP command in the GCG package. 
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Comparison of the hydropathy profile of MOAT - E with 
other members of the MRP/cMOAT subfamily if shown in 
figure 10. The data reveal that MOAT - E has a hydrophobic 
N-terminal segment that is present in its closest 
relatives, MOAT-D, MRP and cMOAT. This structural 
feature is present in all of the currently known organic 
anion transporters, and suggests that MOAT-E may share 
substrate specificity with MRP and cMOAT . MOAT-E may 
also share the drug resistance activity of the latter two 
proteins. In contrast, MOAT-B and MOAT-C do not have 
this hydrophobic N-terminal extension. 

Expression Pattern of MOAT-E in Human Tissues. 

In a Northern blot of RNA isolated from various 
tissues, MOAT-E expression is restricted to liver and 
kidney, suggesting that MOAT-E may participate the 
excretion of substances into the urine and bile. See 
Figure 11. This figure also shows that MOAT-E is 
expressed as an ~6 kB transcript. This is in contrast to 
the -2.3 kB transcript that was reported for ara, clearly 
indicating that the fused ara transcript is unique to the 
cell line from which it was isolated, and is not a 
physiological transcript. Together, the isolation of 
MOAT-E and analysis of its sequence and expression 
pattern suggest that it may be involved in cellular 
resistance to drugs and/or the excretion of drugs into 
the urine and bile. 



DISCUSSION 

The present invention discloses additional 
MRP/cMOAT-related transporters which were identified by 
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using a degenerative PCR cloning approach in which the 
conserved amino terminal ATP-binding domain of known 
eukaryotic transporters was targeted. Using this 
approach the complete coding sequences of MOAT-B , MOAT-C, 
MOAT - D and MOAT-E were obtained. MOAT-B is a protein 
whose predicted structure indicates that it is a member 
of the ABC transporter family. Comparison of the MOAT-B 
predicted protein with other transporters reveals that it 
is most closely related to MRP, cMOAT and yeast YCF1, and 
thus extends the number of known full length MRP-related 
transporters. The similarity of MOAT-B to these 
transporters suggest that it shares a similar substrate 
specificity. Transport assays using membrane vesicle 
preparations indicate that MRP is capable of transporting 
diverse organic anions, including glutathione 
S-conjugates such as LTC 4/ oxidized glutathione, and 
glucuronidated and sulfated conjugates of steroid 
hormones and bile salts (7) . Although membrane vesicle 
transport assays of substrate specificity using 
cMOAT- trans fected cells have not yet been reported, 
genetic and biochemical studies using TR- and EHBR rat 
strains, which are defective in the hepatobiliary 
excretion of glutathione and glucuronate conjugates, 
indicate that it is also an ATP-dependent transporter of 
organic anions. cMOAT , which is primarily expressed in 
the canalicular membrane of hepatocytes, has been 
reported to be absent in these rat strains, and 
hepatocyte canalicular membranes prepared from the mutant 
rats are deficient in the ATP-dependent transport of 
glutathione and glucuronate conjugates (23, 24) . In 
addition, cMOAT protein has also been reported to be 
absent in the hepatocytes of patients with Dubin- Johnson 
syndrome (25) , a disorder manifested by chronic 
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conjugated hyperbilirubinemia. YCF1, a yeast 
transporter, has also been demonstrated to transport 
glutathione complexes (26) . Thus, based upon the 
similarity of MOAT-B to these three transporters, it is 
possible that it also functions to transport organic 
anions, an activity critical to the cellular 
detoxification of a wide range of xenobiotics. 

MOAT-C, MOAT - D and MOAT-E are three other 
MRP/ cMOAT- related transporters. The isolation of these 
two transporters extends the number of known full length 
members of this subfamily to six. Based upon the degree 
of amino acid similarity and overall topology these six 
proteins fall into two groups. The first group is 
composed of MOAT - D , MOAT-E, MRP and cMOAT . These four 
transporters are highly related, sharing -3 9-45% amino 
acid identity. MOAT-D is more closely related to MRP 
(57% identity) than is cMOAT (48% identity) , and is 
therefore the closest known relative of MRP. In addition 
to a high degree of amino acid identity, the similarity 
between MOAT-D, MRP and cMOAT, also extends to overall 
topology. Like MRP and cMOAT, MOAT-D and MOAT-E have 
three membrane spanning domains, including an N- terminal 
hydrophobic extension that is predicted to harbor ~5 
transmembrane helices, and which is absent in 
transporters such as CFTR and MDR1 . This N-terminal 
extension is also present in YCF1, a related yeast 
transporter that transports glutathione S-conjugates , and 
SUR, a more distantly related transporter involved in the 
regulation of potassium channels. The second group of 
MRP/cMOAT- related transporters is composed of MOAT-B and 
MOAT-C. These two transporters are distinguished from the 
first group by their lower level of amino acid similarity 
and distinct topology. Like MOAT-D and MOAT-E, MOAT-B 
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and MOAT-C are more closely related to MRP (39% and 36%, 
respectively) and cMOAT (3 7% and 3 6%, respectively) than 
to other eukaryotic transporters . However, they share 
considerably less similarity with MRP, cMOAT, MOAT-D and 
MOAT - E than the latter four transporters share with each 
other (-39-45% identity) . In addition, in contrast to 
MRP, cMOAT, MOAT-D and MOAT-E, MOAT-B and MOAT-C do not 
have an N-terminal membrane spanning domain, and their 
topology is therefore more similar to many other 
eukaryotic ABC transporters that also have only two 
membrane spanning domains . 

Defining the contributions of MOAT-B, MOAT-C, MOAT-D 
and MOAT-E to cytotoxic drug resistance will facilitate 
the design of novel chemotherapeutic agents. The 
multidrug resistance activity of MRP is well described. 
While the drug sensitivity pattern of cMOAT-transf ected 
cells has not yet been reported, the possibility that it 
may also confer resistance to cytotoxic drugs is 
suggested by a recent report in which transfection of a 
cMOAT ant i sense vector was found to enhance the 
sensitivity of a human liver cancer cell line to both 
natural product drugs and cisplatin. Since MOAT-D and 
MOAT-E are more closely related to MRP than is cMOAT, the 
possibility that they will also confer resistance is 
particularly intriguing. The availability of the MOAT-B, 
MOAT-C, MOAT-D and MOAT-E cDNAs will facilitate the 
analysis of their possible contributions to cytotoxic 
resistance . 
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While certain of the preferred embodiments of the 
present invention have been described and specifically 
exemplified above, it is not intended that the invention 
be limited to such embodiments. Various modifications 
may be made thereto without departing from the scope and 
spirit of the present invention, as set forth in the 
following claims. 
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What is claimed is: 

1. An isolated nucleic acid molecule having the 
sequence of SEQ ID NO : 1 , said nucleic acid molecule 
comprising a nucleotide sequence encoding a MOAT - B 
transporter protein about 1350 amino acids in length, 
said encoded transporter protein comprising a 
multi -domain structure including a tandem repeat of 
nucleotide binding folds appended C-terminal to a 
hydrophobic domain, said nucleotide binding folds having 
Walker A and B ATP binding sites, said C-terminal domain 
having a plurality of membrane spanning helices. 

2. The nucleic acid molecule of claim 1, which 

is DNA. 

3. The DNA molecule of claim 2, which is a 
cDNA comprising a sequence approximately 5 . 9 kilobase 
pairs in length that encodes said MOAT - B transporter 
protein. 

4. The DNA molecule of claim 2, which is a 
gene comprising introns and exons, the exons of said gene 
specifically hybridizing with the nucleic acid of SEQ ID 
NO 1, and said exons encoding said MOAT-B transporter 
protein. 

5. An isolated RNA molecule transcribed from 
the nucleic acid of claim 1. 

6. The nucleic acid molecule of claim 1, 
wherein said sequence encodes a MOAT-B transporter 
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protein having an amino acid sequence selected from the 
group consisting of SEQ ID NO 2 and amino acid sequences 
encoded by natural allelic variants of said sequence. 

7. The nucleic acid molecule of claim 6, which 
comprises SEQ ID NO 1. 

8. An antibody immunologically specific for 
the protein encoded by the nucleic acid of claim 1. 

9. An antibody as claimed in claim 8, said 
antibody being monoclonal. 

10. An antibody as claimed in claim 8, said 
antibody being polyclonal. 

11. An isolated nucleic acid molecule having 
the sequence of SEQ ID NO: 3, said nucleic acid molecule 
comprising a sequence encoding a MOAT-C transporter 
protein about 1450 amino acids in length, said 
transporter protein having a multi-domain structure 
including a tandem repeat of nucleotide binding folds, 
said nucleotide binding foldes having Walker A and B 
binding sites, and a C- terminal hydrophobic domain that 
contains several membrane spanning helices. 

12. The nucleic acid molecule of claim 11, which is 

DNA. 

13. The DNA molecule of claim 12, which is a cDNA 
comprising a sequence approximately 6.6 kilobase pairs in 
length that encodes said MOAT-C transporter protein. 
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14. The DNA molecule of claim 12, which is a gene 
comprising introns and exons , the exons of said gene 
specifically hybridizing with the nucleic acid of SEQ ID 
NO 3, and said exons encoding said MOAT-C transporter 
protein. 

15. An isolated RNA molecule transcribed from the 
nucleic acid of claim 11. 



16. The nucleic acid molecule of claim 11, wherein 
said sequence encodes a MOAT-C transporter protein having 
an amino acid sequence selected from the group consisting 
of SEQ ID NO 4 and amino acid sequences encoded by 
natural allelic variants of said sequence. 

17. The nucleic acid molecule of claim 11, which 
comprises SEQ ID NO 3 . 

18. An antibody immunologically specific for the 
protein encoded by the nucleic acid of claim 11. 

19. An antibody as claimed in claim 18, said 
antibody being monoclonal. 

20. An antibody as claimed in claim 18, said 
antibody being polyclonal. 

21. An oligonucleotide between about 10 and about 
200 nucleotides in length, which specifically hybridizes 
with a protein translation initiation site in a 
nucleotide sequence encoding amino acids of SEQ ID NO 4 . 
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22. An oligonucleotide between about 10 and about 
200 nucleotides in length, which specifically hybridizes 
with a protein translation initiation site in a 
nucleotide sequence encoding amino acids of SEQ ID NO 2. 

23. An isolated nucleic acid molecule having the 
sequence of SEQ ID NO: 5, said nucleic acid molecule 
comprising a sequence encoding a MOAT-D transporter 
protein about 1550 amino acids in length, said 
transporter protein having a multi -domain structure 
including a tandem repeat of nucleotide binding folds, 
said nucleotide binding foldes having Walker A and B 
binding sites, and a C- terminal hydrophobic domain that 
contains several membrane spanning helices. 



24. The nucleic acid molecule of claim 23, which is 

DNA. 

25. The DNA molecule of claim 24, which is a cDNA 
comprising a sequence approximately 6 kilobase pairs in 
length that encodes said MOAT-D transporter protein. 

26. The DNA molecule of claim 24, which is a gene 
comprising introns and exons, the exons of said gene 
specifically hybridizing with the nucleic acid of SEQ ID 
NO 5, and said exons encoding said MOAT-D transporter 
protein. 

27. An isolated RNA molecule transcribed from the 
nucleic acid of claim 23. 

28. The nucleic acid molecule of claim 23, wherein 
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said sequence encodes a MOAT-D transporter protein having 
an amino acid sequence selected from the group consisting 
of SEQ ID NO 6 and amino acid sequences encoded by- 
natural allelic variants of said sequence. 

29. The nucleic acid molecule of claim 23, which 
comprises SEQ ID NO 5. 

30. An antibody immunologically specific for the 
protein encoded by the nucleic acid of claim 23. 

31. An antibody as claimed in claim 30, said 
antibody being monoclonal. 

32. An antibody as claimed in claim 30, said 
antibody being polyclonal. 

33. An oligonucleotide between about 10 and about 
200 nucleotides in length, which specifically hybridizes 
with a protein translation initiation site in a 
nucleotide sequence encoding amino acids of SEQ ID NO 6 . 

34. An isolated nucleic acid molecule having the 
sequence of SEQ ID NO : 7 , said nucleic acid molecule 
comprising a nucleotide sequence encoding a MOAT - E 
transporter protein about 1503 amino acids in length, 
said transporter protein having a multi-domain structure 
including a tandem repeat of nucleotide binding folds, 
said nucleotide binding folds having Walker A and B 
binding sites, and a C- terminal hydrophobic domain that 
contains several membrane spanning helices. 

35. The nucleic acid molecule of claim 34, 
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36. The DNA molecule of claim 35, which is a 
cDNA comprising a sequence approximately 6 kilobase pairs 
in length that encodes said MOAT-E transporter protein. 

37. The DNA molecule of claim 35, which is a 
gene comprising introns and exons, the exons of said gene 
specifically hybridizing with the nucleic acid of SEQ ID 
NO 7, and said exons encoding said MOAT-E transporter 
protein. 

38. An isolated RNA molecule transcribed from 
the nucleic acid of claim 34. 

39. The nucleic acid molecule of claim 34, 
wherein said sequence encodes a MOAT-E transporter 
protein having an amino acid sequence selected from the 
group consisting of SEQ ID NO 8 and amino acid sequences 
encoded by natural allelic variants of said sequence. 

40. The nucleic acid molecule of claim 39, 
which comprises SEQ ID NO 7 . 

41. An antibody immunologically specific for 
the protein encoded by the nucleic acid of claim 34. 

42. An antibody as claimed in claim 41, said 
antibody being monoclonal . 

43. An antibody as claimed in claim 41, said 
antibody being polyclonal. 
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44. An oligonucleotide between about 10 and 
about 200 nucleotides in length, which specifically 
hybridizes with a protein translation initiation site in 
a nucleotide sequence encoding amino acids of SEQ ID NO 
7. 

45. A plasmid comprising a nucleotide sequence 
selected from the group consisting of SEQ ID NO: 1, SEQ 
ID NO: 3, SEQ ID NO: 5 and SEQ ID NO : 7 . 

46. A vector comprising a nucleotide sequence 
selected from the group consisting of SEQ ID NO: 1, SEQ 
ID NO: 3, SEQ ID NO: 5 and SEQ ID NO : 7 . 

47. A retroviral vector comprising a 
nucleotide sequence selected from the group consisting of 
SEQ ID NO: 1, SEQ ID NO: 3, SEQ ID NO: 5 and SEQ ID NO : 7 . 

48. A host cell comprising at least one 
nucleic acid molecule having a sequence selected from the 
group consisting of SEQ ID NO: 1, SEQ ID NO: 3, SEQ ID 

NO : 5 and SEQ ID NO : 7 . 

49. A host cell as claimed in claim 48, 
wherein said host cell is selected from the group 
consisting of bacterial, fungal, mammalian, insect and 
plant cells. 

50. A host cell as claimed in claim 48, 
wherein said nucleic acid is provided in a plasmid and is 
operably linked to mammalian regulatory elements which 
confer high expression and stability of mRNA transcribed 
from said nucleic acid. 
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51. A host cell as claimed in claim 48, 
wherein said nucleic acid is provided in a plasmid and is 
operably linked to mammalian regulatory control elements 
in reverse anti-sense orientation. 

52. A host animal comprising at least one 
nucleic acid molecule selected from the group consisting 
of SEQ ID NO: 1, SEQ ID NO : 3, SEQ ID NO : 5 and SEQ ID 
NO: 7. 

53. A host animal as claimed in claim 52, 
wherein said animal harbors a homozygous null mutation in 
its endogenous MOAT gene wherein said mutation has been 
introduced into said mouse or an ancestor of said mouse 
via homologous recombination in embryonic stem cells, and 
further wherein said mouse does not express a functional 
mouse MOAT protein. 

54. The transgenic mouse of claim 53, wherein 
said mouse is fertile and transmits said null mutation to 
its offspring. 

55. The transgenic mouse of claim 53, wherein 
said null mutation has been introduced into an ancestor 
of said mouse at an embryonic stage following 
microinjection of embryonic stem cells into a mouse 
blastocyt . 

56. A method for screening a test compound for 
inhibition of MOAT mediated transport, comprising: 

a) providing a host cell expressing at least 
one MOAT- encoding nucleic acid having a sequence selected 
from the group consisting of SEQ ID NOS: 1, 3, 5, and 7; 
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b) contacting said host cell with a compound 
suspected of inhibiting MOAT-mediated transporter 
activity; and 

c) assessing inhibition of transport mediated 
by said compound. 

57. A method as claimed in claim 56, wherein 
inhibition of MOAT mediated transport is indicated by 
restoration of anticancer drug sensitivity. 

58. A method as claimed in claim 57, wherein 
said inhibition of MOAT mediated transport is indicated 
by a reduction of transporter mediated cellular efflux of 
anticancer agents. 

59. A kit for detecting the presence of MOAT 
encoding nucleic acids in a sample, comprising: 

a) oligonucleotide primers specific for 
amplification of MOAT encoding nucleic acids; 

b) polymerase enzyme; 

c) amplification buffer; and 

d) MOAT specific DNA for use as a 
positive control. 
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Fig. 5 A 



1 MKDIDIGKEY IIPSPGYRSV RERTSTSGTH RDREDSKFRR TRPLECQDAL ETAARAEGLS 

61 LDASMHSOLR ILDEEHPKGK VHHGLSALKP IRTTSKHQHP VDNAGLFSCM TFSWLSSLAR 

12! VAHKKGELSM EDVWSLSKHE SSDVNCRRLE RLWQEELKEV GPDAASLRRV VWIFC^rI? 

™' 1 1H2 

181 LSIVCLMITQ LAGFSGPAFM VKHLLEYTQA TESNLQYSLL LVLGLLLTEI VRSHSLALTW 

241 ALNYRTGVRL RGAILTMAFK KILKLKNIKE KSLGELIKIC SNDGQRMFE A AAVgTlLAGG 

301 PWAILGMIY NVHLGPTGP LGSAVFILFT PAMMFASRLT AYFRRKCVAA TDERYQKMNE 

361 VLTYIKFIKM YAWVKAFSQS VQKIREEERR ILEKAGYFQG ITVGVAPIW ^AEWTFSV 

421 HMTLGFDLTA AQAFTWTVF HSKTFALfCVT PFSVKSLSEA SVAVDRFKSL FLMEEVHMIK 

481 KKPASPHIKI EMKNATLAWD SSHSSIQNSP KLTPKMKKDK RASRGKKEKV RQLQRTEHQA 

541 VIAEQKGHLL LDSDERPSPE EEEGKHIHIX5 HLRI^RTMsTdLEIQEGKL VGI CCSVGSG 

601 KTSLISAILG QMTLLEGSIA ISGTFAYVAQ OAWILNATLR DNILFGKEYD EERYNSVLNS 

661 ^RPDLAIL PSSDLTEIGE RGAKLSG^R QRISIARAXj sb-rsit^ PLSALDAHVG 

721 KHIFHSAIRK HLKSKTVLFV THQLQYLVDC DEVIFMKEGC ITERGT^EL HNLNGDYATI 

781 FNNLLLGETP PVEIKSKKET SGSQKKSQDK GPKTGSVKKE KAVKPEEGQL VQLEEKGQGS 
TM7 

841 YPWSVYGVYI QAAGGPLAFL VTKAIiFMLHY GSTAFSTWWL SYWIKQGSGN TTVTRGNETS 
TMR 

901 VSDSMKDNPH MQYYASIYAL SKAVMLILKA IRGWFVKGT LRASSELHDE LFRR1LRSPM 

961 KTrVTTPTGR ILMR FSKDKD EVDYP^PFQA EMFIQUYI^T FFCYGMIAGY T^ZZ^ 

1021 LVILFSVLHI VSRVLIRELK RIMtQSPr LSHTTSSIQG IATIEAYHKG QEFLHRTQEL 

1081 LDDNQAPFFL F TCAMRWLAY RLDLISUUJ TTTGLMTVIiM HGQIPPAY AG I^YAVQLT 

1141 GLPQFTVRLA SETEARFTSV ERIKHYIKTL ELEAPAHIKN KAPSPDWPQE GEVTFEHAEM 
(+-NBF2 

1201 RYREKLPLVL KKVSFTIKPK EKIGtVGR__psSLGMAL FFXVELSGGC IKIDGVRISD 
1261 IGLADLRSKL SIIPQEPVLF SGTVHSNUJP FNQYTEDQIW DALERTHMKE CIA.QLPLKLE 

1321 sevmengdnf sv^r QLLCI arallrhc p lilde ataam dteSSe TIREAFADCT 

1381 KLTIAHRLET VLGSDRIKVL AQGQWEFDT PSVLLSNDSS RFYAMFAAAE HKVAVKG 
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Fig. 5B 

1 MGPMDALCGS GELGS KFWDS HLSVHTEN PD LTPCFQ NSLL AWVPCIYLWV ALPCYl£ yLR 
61 HHCRGYIILS HLSK lSvlg VLLWCVSWAD LFYSFHGLVH GRAP APVFFV TPLWGVTKL 
121 LATLLIQYEK LQGV QSijGVL XIFWP^WC AIVPFRSKIt, LAKAEGEISD PFRF^FyTS 
18! FALVLSALI L ACFREKPPFF SAKNVDPNPY PETSVGFLSR LFFWWF'f KMA IYGYRHPLEE 
2 <1 *°LWSLK EE D RSQMWQQLL EAWRKQEKQT ARHKASAAPG KNASGEDEVL LGARPRPRKP 
301 SFL ^ 7 Tt ' KLIQDLLSFI NPQLLSILIR FISNPMAPSW WGFLVAGLMF 

361 LCSMMQ SLIL QHYYHYIFVT GVKFRTG IMG VIYRKALVIT HSVKRASTVG EIVNLMSVDA 
«1 QRFMDLAPFX NLLHSAPLQI ILAIYFLWQH LGPSVLAGVA F^LM AVA VKMRAFQ 

«i ^ia«n« irvucMM* PSFLKQVEG! rqge^llrt aaylhttttf 

541 TWMCSPFLVT L ITLW VYVYV DPNNVLDAEK A^VSLFHxTrL^PQ L ISHLTQASV 

601 SLKKIQQFLS QEEI^PQSVE ^XSPGYAI TIHSGTFTWA QDI-PPtS DIQVPKGALV 

661 AVVGPV^KJSI.VSALLGE MEKLEGKVHM KGSVAYVPQQ AWIQNCTLQE NVLFGKALNP 

721 KRYCWTLEAC^LABLEMLP GGDQTEIGEK GIHLSGGQRQ RVSLARAVYS DADIFLLDDP 

781 LSAVDSHVAK HIFDHVIGPE GVLAGKTRVL VTHGISFLPQ TDFIIVLADG QVSEMGPYPA 

841 LLQRNGSFAN FLCKYAPDED QGHLEDSWTA LEGAEDKEAL LIEDTLSHHT DLTDNDPVTY 

901 VVQKQF MRQL SALSS ^Q GRPVPR&HLG PSEKVQVTEA KADGALTQEE KAAIGTVELS 

9 " ™™^*™*IC LL YVGQSAAAIG AMVWLSAWTK DAMADSRQNN TSLRI^AA 

1021 LSIIOGFLVH LAAKAMAAGG IQAAKVLEQA LLENKIRSPQ SFFDTTPSGR ILNCFSKDIY 

1081 ^^viapv^ uu^sffka istlwimas tp utwilp SSLlvqk ftaa tsrqlk 

1141 RLESVSRSP^TSKFSETVTG ASVIRATNKS RDFEII^TK VDAKQRSCYP YIISNRHLSI 

"CI GVEFVGKCW LFAALFAVIG RSSLNPGLVG LSVSySJIt FAI.NWKIRMM SDLESKIVAV 

1261 ERVKETSKTE TEAPWWEGS RPPEGWPPRG EVEFRNYSVR TRPG^R DLSLHVHGGE 

1321 KVGI VGRTGA^ GKS SMTIiCLF RJLEAAKGEI RIDGLNVADI GLHDLRSQLT IIPQDPI^S 

1381 GTLRKNLDPF GSTSEEDIKW ALELSHLHTF VSSQPAGLDF QCSEGGEHLS YGQRQLVCXA 

1441 RALLRKSRIL_VLDEATAAID LETDNLIQAT IRTQFDTCTV LTIAHRLNTI MDYTKVLYLD 

1501 KGWAEFDSP AHLIAARGIF YGMARDAGLA 
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Figure 8 
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1 MAAPAEPCAG QGVWNQTEPE PAATSLLSLC FLRT AGVWVP PMYLtWLGPl YLLFIH HHGR 

61 GYLRMSPLFK AKMVLGFALI VLCTSSVAVA LWKIQQGTPE APEFLIHPTV WLTTMSFAVF 

121 LIHTERKKGV QSSGVLFGYW LLC FVLP AT N AAOOASGAGF OSDPVRHLST YLCLSLWAQ 

181 FVLSCLADQP PFFPEDPQQS NPCPETGAAF PSKATFWWVS GLWJRGYRRP LRPKDLWSLG 

241 RENSSEELVS RLEKEWMRNR SAARRHNKAI AFKRKGGSGM KAPETEPFLR QEGSQWRPLL 

301 KAIWQVFHST FLLGTLSLII SDVFRFTVPK LLSLFLEFIG DPKPPANK.GY LLAVLMFLSA 

361 CLQTLFEQQN MYRLKVPQMR LRSAITGLVY RKVLALSSGS RKASAVGDW NLVSVDVCjRL 

421 TESVLYLNGL WLPLVWIWC FVYLWQLLGP SALT A I AVFL SLLPLNFFIS KKRNHHQEEQ 

481 HROKDSRARL TSSILRKSKT IKFHGWEGAF LDRVLGIRGQ ELGALRTSGL LFSVSLVSFQ 

541 VSTFLVALW FAVHTLVAEN AMNAEKAFVT LTVLNILNKA QAFLPFSIHS LVQARVSFDR 

r-^NBFI 

€01 LVTFLCLEEV DPGWDSSSS GSAAGKDCIT IHSATFAWSQ ESPPCLHRIN LTVPQGCLLA 
661 WGPVGAGKS SLLSALLGEL SKVEGFVSIE GAVAYVPQEA WVQNTSWEN VCFGQELDPP 

A 

721 WLERVLEACA LQPDVDSFPE GIHTSIGEQG MNLSGGQKOR LSLARAVYRX A AVYLLDD PL 

NBFI-^ C B 

781 AALDAHVGQH VFKQVIGPGG LLCjGTTRILV THALHILPQA DWIIVLANGA IAEMGSYQEL 

841 LQRKGALVCL LDQARQPGDR GEGETEPGTS TKDPRGTSAG RRPELRRERS IKSVPEKDRT 

901 TSEAQTEVPL DDPDRAGWPA GKDSIQYGRV KATVHLAYLR AVGTPLCLYA LFLFLCQQVA 

961 SFCRGYWLSL WADDPAVGGQ QTQAALRGGI FGLLGCLQAI GLFASMAAVL LGGARASRLL 

1021 FQRLLWDWR SPISFFERTP IGHLLNRFSK ETDTVDVDIP DKLRSLLMYA FGLLEYSLW 

1081 AVATPLATVA ILPLFLLYAG FQSLYWSSC QLRRLESASY SSVCSHMAET FOGSTWRAF 

1141 RTQAPFVAQN NARVDESQRI SFPRLVADRW LAANVELLGN GLVFAAATCA VLSKAHLSKG 

1201 LVGFSVSAAL QVTQALQWW RNWTDLENSI VSVERMQDYA WTPKEAPWRL PTCAAQPPWP 

r*-NBF2 

1261 QGGQIEFRDF GLRYRPELPL aVqGVSLKIH AGEKVGIV GR TGAGKS SLAS GLLRLQEAAE 

A 

1321 GGIWIDGVPI AHVGLHTLRS RISIIPQDPI LFPGSLRMNL DLLQEHSDEA IWAALETVQL 

NBF2-*— 

1381 KALVASLPGQ LQYKCADRGE DL SVGQKQ LL CLARALLRKT Q ILILD EATA AVDPGTELQM 
1441 <?amlgswfaq CTVLLIAHRL RSVMDCARVL VMDKGQVAES GSPAQLLAQK GLFYRLAQES 
1501 GLV 
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ATGCTGCCCGTGTACCAGGAGGTGAAGCCCAACCCGCTGCAGGACGCGAACATCTGCTCA 
TACGACGGGCACATGGTCCTCCACTTCGGGTTGGGCGACGTCCTGCGCTTGTAGACGAGT 

a MLPVYQEVKPNPLGPANICS - 

CGCGTGTTCTTCTGGTGGCTCAATCCCTTGTTTAAAATTGGCCATAAACGGAGATTAGAG 
GCGCACAAGAAGACCACCGAGTTAGGGAACAAATTTTAACCGGTATTTGCCTCTAATCTC 

a RVFFWWLNPLFKIGHKRRLE- 

GAAGATGATATGTATTCAGTGCTGCCAGAAGACCGCTCACAGCACCTTGGAGAGGAGTTG 

121 + + + + + + 180 

CTTCTACTATACATAAGTCACGACGGTCTTCTGGCGAGTGTCGTGGAACCTCTCCTCAAC 

a EDDMYSVLPEDRSQHLGEEL- 

CAAGGGTTCTGGGATAAAGAAGTTTTAAGAGCTGAGAATGACGCACAGAAGCCTTCTTTA 
GTTCCCAAGACCCTATTTCTTCAAAATTCTCGACTCTTACTGCGTGTCTTCGGAAGAAAT 

a QGFWDKEVLRAENDAQKPSL- 

ACAAGAGCAATCATAAAGTGTTACTGGAAATCTTATTTAGTTTTGGGAAI I I I IACGTTA 

241 + + + + + + 300 

TGTTCTCGTTAGTATTTCACAATGACCTTTAGAATAAATCAAAACCCTTAAAAATGCAAT 

a TRAIIKCYWKSYLVLGIFTL - 

ATTGAGGAAAGTGCCAAAGTAATCCAGCCCATA I I 1 I 1 GGG AAAAATTATT AATTATTTT 

301 + + + + + + 360 

TAACTCCTTTCACGGTTTCATTAGGTCGGGTATAAAAACCCTTTTTAATAATTAATAAAA 

a IEESAKV1QPIFLGKIINYF - 

GAAAATTATGATCCCATGGATTCTGTGGCTTTGAACACAGCGTACGCCTATGCCACGGTG 
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361 + + + + + + 420 

CTTTTAATACTAGGGTACCTAAGACACCGAAACTTGTGTCGCATGCGGATACGGTGCCAC 

ENYDPMDSVALNTAYAYATV - 

CTGACl I I I I GCACGCTCATTTTGGCTATACTGCATCACTTATATTTTTATCACGTTCAG 
GACTGAAAAACGTGCGAGTAAAACCGATATGACGTAGTGAATATAAAAATAGTGCAAGTC 
LTFCTLILAILHHLYFYHVQ - 

TGTG CTGGGATG AG GTTACG AGT AG CCATGTG CCATATG ATTTATCG G A AGG C ACTTCG T 
ACACGACCCTACTCCAATGCTCATCGGTACACGGTATACTAAATAGCCTTCCGTGAAGCA 
CAGMRLRVAMCHMIYRKALR - 

CTTAGTAACATGGCCATGGGGAAGACAACCACAGGCCAGATAGTCAATCTGCTGTCCAAT 
541 + — — + + + + + 600 

GAATCATTGTACCGGTACCCCTTCTGTTGGTGTCCGGTCTATCAGTTAGACGACAGGTTA 
LSNMAMGKTTTGQIVNLLSN - 

GATGTGAACAAGTTTGATCAGGTGACAGTGTTCTTACACTTCCTGTGGGCAGGACCACTG 

601 + + + + + + 660 

CTACACTTGTTCAAACTAGTCCACTGTCACAAGAATGTGAAGGACACCCGTCCTGGTGAC 

DVNKFDQVTVFLHFLWAGPL - 

CAGGCGATCGCAGTGACTGCCCTACTCTGGATGGAGATAGGAATATCGTGCCTTGCTGGG 

661 + + + + + + 720 

GTCCGCTAGCGTCACTG ACG G G ATG AG ACCTACCTCTATCCTTATAGCACG G AACG ACCC 

QAIAVTALLWMEIGISCLAG - 

ATG G C AGTTCTAATC ATTCTCCTG CCCTTGC AAAG CTG TTTTG G G AAG TTG TTCTC ATC A 
TACCGTCAAGATTAGTAAGAGGACGGGAACGTTTCGACAAAACCCTTCAACAAGAGTAGT 
MAVLIIILPLQSCFGKLFSS - 

CTGAGGAGTAAAACTGCAACTTTCACGGATGCCAGGATCAGGACCATGAATGAAGTTATA 
781 + + + + + + 840 
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GACTCCTCATTTTGACGTTGAAAGTGCCTACGGTCCTAGTCCTGGTACTTACTTCAATAT 
a LRSKTATFTDARIRTMNEVI - 

ACTGGTATAAGGATAATAAAAATGTACGCCTGGGAAAAGTCATTTTCAAATCTTATTACC 

841 + + + - + + + 900 

TGACCATATTCCTATTATTTTTACATGCGGACCCTTTTCAGTAAAAGTTTAGAATAATGG 
a TGIRIIKMYAWEKSFSNLIT - 

90^I! T ? GAAAGAAGGAGAmCCAAGA ^ CT6AGAAG ^ CCTGCCTCAGGGGGATC 
TTAAACTCTTTCTTCCTCTAAAGGTTCTAAGACTCTTCAAGGACGGAGTCCCCCTACTTA 
a NLRKKEISKILRSSCLRGMN - 

aaccgaagcaaaaagtcacgttcgttttagtagcacaaacactggaagtggtggatgcac 
a lasffsaskiivfvtfttyv - 

1 2 Ctcctcggcagtgtgatca cagccagccgcgtgttcgtggcagtgacgctgtatggggct 

+ ~ + + -+ + + 1080 

gaggagccgtcacactagtgtcggtcggcgcacaagcaccgtcactgcgacataccccga 

a LLGSVITASRVFVAVTLYGA- 

GTGCGGCTGACGGrrACCCTCTTCTTCCCCTCAGCCATTGAGAGGGTGTCAGAGGCAATC 
1081 + + + + + + 1 140 

CACGCCGACTGCCAATGGGAGAAGAAGGGGAGTCGGTAACTCTCCCACAGTCTCCGTTAG 

a VRLTVTLFFPSAIERVSEAI - 

, 1 ^^^^^^A^^^^^AQACCTTTTTGCTACTTGATGAGATATCACAGCGCAACCGT 
+ + + + + + 1200 

CAGTCGTAGGCTTCTTAGGTCTGGAAAAACGATGAACTACTCTATAGTGTCGCGTTGGCA 



a VSIRRiqtf 



LLLOEISQRNR 



CAGCTGCCGTCAGATGGTAAAAAGATGGTGCATGTGCAGGATT 
201 + + + + J + + 126Q 

GTCGACGGCAGTCTACCATTTTTCTACCACGTACACGTCCTAAAATGACG, 



i A AAAACCCTA 
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a QLPSDGKKMVHVQDFTAFWD 

^^AAGGCATCAGAGACCCCAACTCTACAAGGCCTTTCCTTTACTGTCAGACCTGGCGAATTG 
TTCCGTAGTCTCTGGGGTTGAGATGTTCCGGAAAGGAAATGACAGTCTGGACCGCTTAAC 
a KASETPTLQGLSTTVRPGEL - 

i32 y AGCTGTGGTCGGCCCCGT GGGAGCAGGGAAGTCATCACTGTTAAGTGCCGTGCTCGGG 
AATCGACACCAGCCGGGGCACCCTCGTCCCTTCAGTAGTGACAATTCACGGCACGAGCCC 
a LAVVGPVGAGKSSLLSAVLG - 

i38 G ^ 7TGGCCCCAAGT CACGGGCTGGTCAGCGTGCATGGAAGAATTGCCTATGTGTCTCAG 
CTTAACCGGGGTTCAGTGCCCGACCAGTCGCACGTACCTTCTTAACGGATACACAGAGTC 
a ELAPSHGLVSVHGRIAYVSQ - 

CAGCCCTGGGTGTTCTCGGGAACTCTGAGGAGTAATATTTTATTTGGGAAGAAATATGAA 
1441 + + + + + _ + 1500 

GTCGGGACCCACAAGAGCCCTTGAGACTCCTCATTATAAAATAAACCCTTCTTTATACTT 

a QPWVFSGTLRSNILFGKKYE - 

AAG G AACG ATATG AAAAAGTC ATAAAG GCTTGTG CTCTG AAAAAG G ATTTACAG CTGTTG 
1501 + + + + + + 1660 

TTCCTTG CTATACTTTTTC AG TATTTCCG AACACG AG ACTTTTTCCTAAATGTCG AC AAC 
a KERYEKVIKACALKKOLQLL - 

GAGGATGGTGATCTGACTGTGATAGGAGATCGGGGAACCACGCTGAGTGGAGGGCAGAAA 
1561 + + + + + + 1620 

CTCCTACCACTAGACTGACACTATCCTCTAGCCCCTTGGTGCGACTCACCTCCCGTCTTT 
a EDGDLTVIGORGTTLSGGQK - 

^^GCACGGGTAAACCTTGCAAGAGCAGTGTATCAAGATGCTGACATCTATCTCCTGGACGAT 
CGTGCCCATTTGGAACGTTCTCGTCACATAGTTCTACGACTGTAGATAGAGGACCTGCTA 
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ARVNLARAVYQDADIYLLDD - 

CCTCTCAGTGCAGTAGATGCGGAAGTTAGCAGACACTTGTTCGAACTGTGTATTTGTCAA 
GGAGAGTCACGTCATCTACGCCTTCAATCGTCTGTGAACAAGCTTGACACATAAACAGTT 
PLSAVDAEVSRHLFELCICQ - 

ATTTTGCATGAGAAGATCACAATTTTAGTGACTCATCAGTTGCAGTACCTCAAAGCTGCA 

TA AA ACG TACTCTTCTAGTGTTAAAATCACTG AGTAGTC AACG TC ATG G AGTTTCG ACGT 

a ILHEKITILVTHQLQYLKAA - 

AGTCAGATTCTGATATTGAAAGATGGTAAAATGGTGCAGAAGGGGACTTACACTGAGTTC 
1801 + + + + + + 1B60 

TCAGTCTAAGACTATAACTTTCTACCATTTTACCACGTCTTCCCCTGAATGTGACTCAAG 

a SQILILKDGKMVQKGTYTEF- 

CT AAAATCTG GTATAG ATTTTG G CTCCCTTTTAAAG AAG GATAATGAG G A AAG TG A AC A A 

861 + + + + + + 1920 

GATTTTAGACCATATCTAAAACCGAGGGAAAATTTCTTCCTATTACTCCTTTCACTTGTT 

3 LKSGIDFGSLLKKDNEESEQ - 

CCTCCAGTTCCAGGAACTCCCACACTAAGGAATCGTACCTTCTCAGAGTCTTCGGTTTGG 
921 + + + + + + 1980 

GGAGGTCAAGGTCCTTGAGGGTGTGATTCCTTAGCATGGAAGAGTCTCAGAAGCCAAACC 

a PPVPGTPTLRNRTFSESSVW- 

TCTCAACAATCTTCTAGACCCTCCTTGAAAGATGGTGCTCTGGAGAGCCAAGATACAGAG 
1981 + + + + + + 2040 

AGAGTTGTTAGAAGATCTGGGAGGAACTTTCTACCACGAGACCTCTCGGTTCTATGTCTC 

a SQQSSRPSLKDGALESQDTE - 

AATGTCCCAGTTACACTATCAGAGGAGAACCGTTCTGAAGGAAAAGTTGGTTTTCAGGCC 
TTACAGGGTCAATGTGATAGTCTCCTCTTGGCAAGACTTCCTTTTCAACCAAAAGTCCGG 

a NVPVTLSEENRSEGKVGFQA 
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TATAAGAATTACTTCAGAGCTGGTGCTCACTGGATTGTCTTCATTTTCCTTATTCTCCTA 
ATATTCTTAATGAAGTCTCGACCACGAGTGACCTAACAGAAGTAAAAGGAATAAGAGGAT 

a YKNYFRAGAHWIVFIFLILL • 

AACACTGCAGCTCAGGTTGCCTATGTGCTTCAAGATTGGTGGCTTTCATACTGGGCAAAC 
TTGTGACGTCGAGTCCAACGGATACACGAAGTTCTAACCACCGAAAGTATGACCCGTTTG 

a NTAAQVAYVLQDWWLSYWAN- 

AAACAAAGTATGCTAAATGTCACTGTAAATGGAGGAGGAAATGTAACCGAGAAGCTAGAT 
2221 + + + + + + 2 2 80 

TTTGTTTCATACGATTTACAGTGACATTTACCTCCTCCTTTACATTGGCTCTTCGATCTA 

a KQSMLNVTVNGGGNVTEKLD- 

CTTAACTGGTACTTAG G AATTTATTCAGGTTTAACTG TAG CTACCGTTCTTTTTGG C ATA 

2281 + + + + + + 2340 

GAATTG ACC ATG AATCCTTA AAT AAGTCCAAATTG AC ATCG ATG G CAAG AAAAACCG TAT 

a LNWYLGIYSGLTVATVLFGI - 

GCAAGATCTCTATTGGTATTCTACGTCCTTGTTAACTCTTCACAAACTTTGCACAACAAA 

2341 + + + + + + 2400 

CGTTCTAGAGATAACCATAAGATGCAGGAACAATTGAGAAGTGTTTGAAACGTGTTGTTT 

a ARSLLVFYVLVNSSQTLHNK- 

ATGTTTG AGTC AATTCTG AAAG CTCCG GTATTATTCTTTG ATAG A A ATCC AATAG GAAGA 

2401 + + + + + + 2460 

TACAAACTCAGTTAAGACTTTCGAGGCCATAATAAGAAACTATCTTTAGGTTATCCTTCT 

a MFESILKAPVLFFDRNPIGR - 

ATTTTAAATCGTTTCTCCAAAGACATTGGACACTTGGATGATTTGCTGCCGCTGACGTTT 
2461 + + + + + + 2520 

TAAAATTTAGCAAAGAGGTTTCTGTAACCTGTGAACCTACTAAACGACGGCGACTGCAAA 
a ILNRFSKDIGHLDDLLPLTF 



Figure 12F 

SUBSTITUTE SHEET (RULE 26) 

7/15/2008, EAST Version: 2.2.1.0 



WO 99/49735 



PCT/US99/06644 



TTAGATTTCATCCAGACATTGCTACAAGTGGTTGGTGTGGTCTCTGTGGCTGTGGCCGTG 
2521 + . + + + + . + 2580 

AATCTAAAGTAGGTCTGTAACGATGTTCACCAACCACACCAGAGACACCGACACCGGCAC 



a LDFIQTLLQVVGV 



V S V A V A V 



^^ATTCCTTGGATCGCAATACCCTTGGTTCCCCTTGGAATCATTTTCATTTTTCTTCGGCGA 
TAAGGAACCTAGCGTTATGGGAACCAAGGGGAACCTTAGTAAAAGTAAAAAGAAGCCGCT 
a 'PWIAIPIVPLGIIFIFLRR- 

TATTTTTTGGAAACGTCAAGAGATGTGAAGCGCCTGGAATCTACAACTCGGAGTCCAGTG 
2641 + + + + + + 2700 

ATAAAAAACCTTTGCAGTTCTCTACACTTCGCGGACCTTAGATGTTGAGCCTCAGGTCAC 

a YFLETSRDVKRLESTTRSPV - 

TTTTCCCACTTGTCATCTTCTCTCCAGGGGCTCTGGACCATCCGGGCATACAAAGCAGAA 
2701 + + + + + + 2760 

AAAAGGGTGAACAGTAGAAGAGAGGTCCCCGAGACCTGGTAGGCCCGTATGTTTCGTCTT 

a FSHLSSSLQGLWTIR 



A Y K A E - 



G AG AGGTG TC AGG AACTGTTTG ATG C AC ACC AGG ATTTAC ATTC AG AG G CTTG G TTCTTG 
2761 + + + + + + 282Q 

CTCTCCACAGTCCTTGACAAACTACGTGTGGTCCTAAATGTAAGTCTCCGAACCAAGAAC 
a ERCQELFOAHQDLHSEAWFL - 

2g2 y^ GACMCGTCCC GCTGGTTCGCCGTCCGTCTGGATGCCATCTGTGCCATGTTTGTC 

AAAAACTGTTGCAGGGCGACCAAGCGGCAGGCAGACCTACGGTAGACACGGTACAAACAG 

a FL TTSRWFAVRLDAICAMFV- 

ATCATCGTTGCCTTTGGGTCCCTGATTCTGGCAAAAACTCTGGATGCCGGGCAGGTTGGT 

2881 + - + . , 

+ + + ■ + + 2940 

TAGTAGCAACGGAAACCCAGGGACTAAGACCGTTTTTGAGACCTACGGCCCGTCCAACCA 
a IIVAFGSLILAKTLDAGQVG - 

TTGGCACTGTCCTATGCCCTCACGCTCATGGGGATGTTTCAGTGGTGTGTTCGACAAAGT 
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2941 — + + + + + + 3000 

AACCGTGACAGGATACGGGAGTGCGAGTACCCCTACAAAGTCACCACACAAGCTGTTTCA 

a LALSYALTLMGMFQWCVRQS 

GCTGAAGTTGAGAATATGATG ATCTCAGTAGAAAGGGTCATTGAATACACAGACCTTGAA 



3001 



3060 



CGACTTCAACTCTTATACTACTAGAGTCATCTTTCCCAGTAACTTATGTGTCTGGAACTT 
AEVENMMISVERVIEYTDLE - 

AAAG AAG C ACCTTGGG AATATCAG AAACG CCCACCACC AG CCTGG CCCCATG A AG G AG TG 



3061 - 



3120 



TTTCTTCGTGGAACCCTTATAGTCTTTGCGGGTGGTGGTCGGACCGGGGTACTTCCTCAC 
a KEAPWEYQKRPPPAWPHEGV • 

ATAATCTTTGACAATGTGAACTTCATGTACAGTCCAGGTGGGCCTCTGGTACTGAAGCAT 



3121 - 



3180 



TATTAGAAACTGTTACACTTGAAGTACATGTCAGGTCCACCCGGAGACCATGACTTCGTA 
IIFDNVNFMYSPGGPLVLKH - 

CTG AC AG CACTC ATT AAATC AC AAG AAAAG GTTG G C ATTG TG G G AAG AACCG G AG CTG G A 



3181 - 



3240 



GACTGTCGTGAGTAATTTAGTGTTCTTTTCCAACCGTAACACCCTTCTTGGCCTCGACCT 
a LTALIKSQEKVGIVGRTGAG - 

AAAAGTTCCCTCATCTCAGCCCTTTTTAGATTGTCAGAACCCGAAGGTAAAATTTGGATT 



3241 - 



-+ 3300 



TTTTC AAG G G AG TAG AGTCGG G AAAAATCTAAC AGTCTTG G G CTTCC ATTTTA AACCTAA 
a KSSLISALFRLSEPEGKIWI - 

GATAAGATCTTGACAACTGAAATTGGACTTCACGATTTAAGGAAGAAAATGTCAATCATA 



3301 - 



3360 



CTATTCTAG A ACTG TTG ACTTT AACCTG AAGTG CT AA ATTCCTTCTTTT AC AGTTAG TAT 
a DKILTTEIGLHDLRKKMSII - 

CCTCAGGAACCTGTTTTGTTCACTGGAACAATGAGGAAAAACCTGGATCCCTTTAAGGAG 
3361 + + + + + + 3420 
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6GAGTCCTTGGACAAAACAAGTGACCTTGTTACTCCTTTTTGGACCTAGGGAAATTCCTC 
PQEPVLFTGTMRKNLDPFKE - 

CACACGGATGAGGAACTGTGGAATGCCTTACAAGAGGTACAACTTAAAGAAACCATTGAA 

3421 + + + — - + + - + 348O 

GTGTGCCTACTCCTTGACACCTTACGGAATGTTCTCCATGTTGAATTTCTTTGGTAACTT 

HTDEELWNALQEVQLKETIE - 

GATCTTCCTGGTAAAATGGATACTGAATTAGCAGAATCAGGATCCAATTTTAGTGTTGGA 
CTAG AAG G ACCATTTTACCTATG ACTTAATCGTCTTAGTCCTAG GTTA A A ATCACAACCT 
DLPGKMDTELAESGSNFSVG - 

CAAAGACAACTGGTGTGCCTTGCCAGGGCAATTCTCAGGAAAAATCAGATATTGATTATT 
GTTTCTGTTGACCACACGGAACGGTCCCGTTAAGAGTCCTTTTTAGTCTATAACTAATAA 
QRQlVCLARAILRKNQflll - 

GATGAAGCGACGGCAAATGTGGATCCAAGAACTGATGAGTTAATACAAAAAAAAATCCGG 

3601 + + + + + + 3660 

CTACTTCG CTG CCGTTTAC ACCT AG GTTCTTG ACTACTC A ATTATG I I 1 I 1 1 I 1 IAGGCC 

DEATANVDPRTDELIQKKIR - 

G AG AAATTTG CCC ACTG C ACCGTG CTAACC ATTG C AC AC AG ATTG AAC ACC ATTATTG AC 

3661 + + + + + + 3720 

CTCTTTAAACGGGTGACGTGGCACGATTGGTAACGTGTGTCTAACTTGTGGTAATAACTG 

EKFAHCTVLTIAHRLNTIID - 

AGCGACAAGATAATGGTTTTAGATTCAGGAAGACTGAAAGAATATGATGAGCCGTATGTT 
TCGCTGTTCTATTACCAAAATCTAAGTCCTTCTGACTTTCTTATACTACTCGGCATACAA 
SDKIMVLOSGRLKEYDEPYV - 

TTGCTGCAAAATAAAGAGAGCCTATTTTACAAGATGGTGCAACAACTGGGCAAGGCAGAA 

3781 + + + + + + 3840 

AACGACGTTTTATTTCTCTCGGATAAAATGTTCTACCACGTTGTTGACCCGTTCCGTCTr 
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a LLQNKESLFYKMVQQLGKAE 

GCCGCTGCCCTCACTGAAACAGCAAAACAGGTATACTTCAAAAGAAATTATCCACATATT 
CGGCGACGGGAGTGACTTTGTCGTTTTGTCCATATGAAGTTTTCTTTAATAGGTGTATAA 

a AAALTETAKQVYFKRNYPHI - 

GGTCACACTGACCACATGGTTACAAACACTTCCAATGGACAGCCCTCGACCTTAACTATT 
3901 + + + + + _ _ + 39S0 

CC AGTGTG ACTG GTGTACCAATGTTTGTG AAGGTTACCTGTCG GGAGCTGG A ATTG ATAA 

a GHTDHMVTNTSNGQPSTLTI - 

TTCGAGACAGCACTG 

3961 + 3975 

AAGCTCTGTCGTGAC 

a FETAL- 
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ATGAAGGATATCGACATAGGAAAAGAGTATATCATCCCCAGTCCTGGGTATAGAAGTGTG 

1 + + + + - + + 60 

TACTTCCTATAGCTGTATCCTTTTCTCATATAGTAGGGGTCAGGACCCATATCTTCACAC 

MKDIDIGKEYIIPSPGYRSV - 

AGGGAGAGAACCAGCACTTCTGGGACGCACAGAGACCGTGAAGATTCCAAGTTCAGGAGA 
TCCCTCTCTTGGTCGTGAAGACCCTGCGTGTCTCTGGCACTTCTAAGGTTCAAGTCCTCT 
RERTSTSGTHRDREDSKFRR - 

ACTCGACCGTTGGAATGCCAAGATGCCTTGGAAACAGCAGCCCGAGCCGAGGGCCTCTCT 
TGAGCTGGCAACCTTACGGTTCTACGGAACCTTTGTCGTCGGGCTCGGCTCCCGGAGAGA 
TRPLECQDALETAARAEGLS - 

CTTGATGCCTCCATGCATTCTCAGCTCAGAATCCTGGATGAGGAGCATCCCAAGGGAAAG 
18, + + + + + + 24Q 

GAACTACGGAGGTACGTAAGAGTCGAGTCTTAGGACCTACTCCTCGTAGGGTTCCCTTTC 
LDASMHSQLRILDEEHPKGK - 

TACCATCATGGCTTGAGTGCTCTGAAGCCCATCCGGACTACTTCCAAACACCAGCACCCA 
241 + + + + + + 300 

ATGGTAGTACCGAACTCACGAGACTTCGGGTAGGCCTGATGAAGGTTTGTGGTCGTGGGT 
YHHGLSALKPIRTTSKHQHP - 

GTGGACAATGCTGGGCTTTTTTCCTGTATGACTTTTTCGTGGCTTTCTTCTCTGGCCCGT 
CACCTGTTACGACCCGAAAAAAGGACATACTGAAAAAGCACCGAAAGAAGAGACCGGGCA 
VDNAGLFSCMTFSWLSSLAR - 

GTGGCCCACAAGAAGGGGGAGCTCTCAATGGAAGACGTGTGGTCTCTGTCCAAGCACGAG 
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CACCGGGTGTTCTTCCCCCTCGAGAGTTACCTTCTGCACACCAGAG ACAGGTTCGTGCTC 
a VAHKKGELSMEDVWSLSKHE ■ 

^TCTTCTGACGTGAACTGCAGAAGACTAGAGAGACTGTGGCAAGAAGAGCTGAATGAAGTT 

AGAAGACTGCACTTGACGTCTTCTGATCTCTCTGACACCGTTCTTCTCGACTTACTTCAA 

a SSDVNCRRLERLWQEELNEV- 

G G GCC AG ACG CTG CTTCCCTG CG AAGG GTTG TG TGG ATCTTCTG CCG C ACCAG G CTCATC 
481 + + + + + + 540 

CCCGGTCTGCGACGAAGGGACGCTTCCCAACACACCTAGAAGACGGCGTGGTCCGAGTAG 
a GPDAASLRRVVWIFCRTRLI - 

CTGTCCATCGTGTGCCTGATGATCACGCAGCTGGCTGGCTTCAGTGGACCAGCCTTCATG 
541 + + + + + + 600 

GACAGGTAGCACACGGACTACTAGTGCGTCGACCGACCGAAGTCACCTGGTCGGAAGTAC 
a LSIVCLMITQLAGFSGPAFM- 

GTGAAACACCTCTTGGAGTATACCCAGGCAACAGAGTCTAACCTGCAGTACAGCTTGTTG 
601 + + + + + + 66Q 

CACTrrGTGGAGAACCTCATATGGGTCCGTTGTCTCAGATTGGACGTCATGTCGAACAAC 
a VKHLLEYTQATESNLQYSUL - 

TTAGTGCTGGGCCTCCTCCTGACGGAAATCGTGCGGTCTTGGTCGCTTGCACTGACTTGG 
661 + + + + + + 720 

AATCACGACCCGGAGGAGGACTGCCTTTAGCACGCCAGAACCAGCGAACGTGACTGAACC 
a LVLGLLLTEIVRSWSL 



A L T W 



^^GCATTGAATTACCGAACCGGTGTCCGCTTGCGGGGGGCCATCCTAACCATGGCATTTAAG 

CGTAACTTAATGGCTTGGCCACAGGCGAACGCCCCCCGGTAGGATTGGTACCGTAAATTC 

a ALNYRTGVRLRGAILTMAFK - 

AAG ATCCTTAAGTTAAAG AACATTAAAG AG AAATCCCTGGGTG AG CTCATCAACATTTG C 
781 + + + + + + 84Q 
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TTCTAGGAATTCAATTTCTTGTAATTTCTCTTTAGGGACCCACTCGAGTAGTTGTAAACG 
a KILKLKNIKEKSLGELINIC - 

TCCAACGATGGGCAGAGAATGTTTGAGGCAGCAGCCGTTGGCAGCCTGCTGGCTGGAGGA 
AGGTTGCTACCCGTCTCTTACAAACTCCGTCGTCGGCAACCGTCGGACGACCGACCTCCT 
a SNDGQRMFEAAAVGSLLAGG - 

CCCGTTGTTGCCATCTTAGGCATGATTTATAATGTAATTATTCTGGGACCAACAGGCTTC 
901 + + + + + + g60 

GGGCAACAACGGTAGAATCCGTACTAAATATTACATTAATAAGACCCTGGTTGTCCGAAG 

a PVVAILGMIYNVIILGPTGF - 

CTG G G ATCAGCTGTTTTTATCCTCTTTTACCC AG CAATG ATGTTTGC ATCACG G CTC AC A 
961 + + + + + + 1020 

GACCCTAGTCGACAAAAATAGGAGAAAATGGGTCGTTACTACAAACGTAGTGCCGAGTGT 
a LGSAVFILFYPAMMF 



A S R L " 



GCATATTTCAGGAGAAAATGCGTGGCCGCCACGGATGAACGTGTCCAGAAGATGAATGAA 



1021 - 



1080 



CGTATAAAGTCCTCTTTTACGCACCGGCGGTGCCTACTTGCACAGGTCTTCTACTTACTT 
a AYFRRKCVAATDERVQKMNE - 

GTTCTTACTTACATTAAATTTATCAAAATG TATG CCTG GGTC AAAGC ATTTTCTCAG AGT 
1081 + + + + + + 114Q 

CAAGAATGAATGTAATTTAAATAGTTTTACATACGGACCCAGTTTCGTAAAAGAGTCTCA 
a VLTYIKFIKMYAWVKAFSQS- 

GTTCAGAAAATCCGCGAGGAGGAGCGTCGGATATTGGAAAAAGCCGGGTACTTCCAGGGT 
CAAGTCTTTTAGGCGCTCCTCCTCGCAGCCTATAACCTTTTTCGGCCCATGAAGGTCCCA 
a VQKIREEERRILEKAGYFQG 

ATCACTGTGGGTGTGGCTCCCATTGTGGTGGTGATTGCCAGCGTGGTGACCTTCTCTGTT 



1201 - 



1260 



TAGTGACACCCACACCGAGGGTAACACCACCACTAACGGTCGCACCACTGGAAGAGACAA 
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a 'TVGVAPIVVVIASVVTFSV- 

^^ATATGACCCTGGGCTTCGATCTGACAGCAGCACAGGCTTTCACAGTGGTGACAGTCTTC 
GTATACTGGGACCCGAAGCTAGACTGTCGTCGTGTCCGAAAGTGTCACCACTGTCAGAAG 
a HMTLGFDLTAAQAFTVVTVF - 

^^AAT^CCATGACTTrTGCTTTGAAAGTAACACCGTTTTCAGTAAAGTCCCTCTCAGAAGCC 
TTAAGGTACTGAAAACGAAACTTTCATTGTGGCAAAAGTCATTTCAGGGAGAGTCTTCGG 
a NSMTFALKVTPFSVKSLSEA - 

! 38 ^T TG ^^ G A ™ AGAG ™^ 

AGTCACCGACAACTGTCTAAATTCTCAAACAAAGATTACCTTCTCCAAGTGTACTATTTC 
a SVAVDRFKSLFLMEEVHMIK - 

AAC AA ACC AG CCAG TCCTC ACATC A AG AT AG AG ATG AA AAATG CC ACCTTG G C ATG GG AC 

1441 + + + + + + 1500 

TTGTTTGGTCGGTCAGGAGTGTAGTTCTATCTCTACTTTTTACGGTGGAACCGTACCCTG 

a NKPASPHIKIEMKNATLAWD - 

TCCTCCCACTCCAGTATCCAGAACTCGCCCAAGCTGACCCCCAAAATGAAAAAAGACAAG 
+ + + + + + 1560 

AGGAGGGTGAGGTCATAGGTCTTGAGCGGGTTCGACTGGGGGTTTTACTTTTTTCTGTTC 
a SSHSSIQNSPKLTPKMKKDK - 

AGGGCTTCCAGGGGCAAGAAAGAGAAGGTGAGGCAGCTGCAGCGCACTGAGCATCAGGCG 
1561 + + + + + + , 620 

TCCCGAAGGTCCCCGTTCTTTCTCTTCCACTCCGTCGACGTCGCGTGACTCGTAGTCCGC 
a RASRGKKEKVRQLQRTEHQA - 

GTGCTGGCAGAGCAGAAAGGCCACCTCCTCCTGGACAGTGACGAGCGGCCCAGTCCCGAA 
1621 + + + + + + l6go 

CACGACCGTCTCGTCTTTC( 



TCCGGTGGAGGAGGACCTGTCACTGCTCGCCGGGTCAGGGCTT 
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VLAEQKGHLLLDSDERPSPE - 

GAGGAAGAAGGCAAGCACATCCACCTGGGCCACCTGCGCTTACAGAGGACACTGCACAGC 
1681 + + _ + + + + 174Q 

CTCCTTCTTCCGTTCGTGTAGGTGGACCCGGTGGACGCGAATGTCTCCTGTGACGTGTCG 
EEEGKHIHLGHLRLORTLHS - 

ATCGATCTGGAGATCCAAGAGGGTAAACTGGTTGGAATCTGCGGCAGTGTGGGAAGTGGA 
741 + + + + + + 1800 

TAGCTAGACCTCTAGGTTCTCCCATTTGACCAACCTTAGACGCCGTCACACCCTTCACCT 

IDLEIQEGKLVGICGSVGSG - 

AAAACCTCTCTCATTTCAGCC ATTTTAGGCCAG ATG ACG CTTCTAG AG G G CAG C ATTG C A 
801 + + + + + + 186Q 

TTTTGGAGAGAGTAAAGTCGGTAAAATCCGGTCTACTGCGAAGATCTCCCGTCGTAACGT 
KTSLISAILGQMTLLEGSIA - 

ATC AG TG G AACCTTCG CTTATG TG G CCCAG C AG GCCTG G ATCCTC AATG CTACTCTGAGA 
361 + + + + + + lg20 

TAGTCACCTTGGAAGCGAATACACCGGGTCGTCCGGACCTAGGAGTTACGATGAGACTCT 
ISGTFAYVAQQAWILNATLR - 

GACAACATCCTGTTTGGGAAGGAATATGATGAAGAAAGATACAACTCTGTGCTGAACAGC 

121 + + ♦ ♦ + 1980 

CTGTTGTAGG AC AAACCCTTCCTTATACTACTTCTTTCTATG TTG AG ACACG ACTTGTCG 

DNILFGKEYOEERYNSVLNS - 

TGCTGCCTGAGGCCTGACCTGGCCATTCTTCCCAGCAGCGACCTGACGGAGATTGGAGAG 
81 + + + + . + + 2040 

ACGACGGACTCCGGACTGGACCGGTAAGAAGGGTCGTCGCTGGACTGCCTCTAACCTCTC 
CCLRPDLAILPSSDLTEIGE - 

CGAGGAGCCAACCTGAGCGGTGGGCAGCGCCAGAGGATCAGCCTTGCCCGGGCCTTGTAT 
41 + + + + + + 2ioq 

GCTCCTCGGTTGGACTCGCCACCCGTCGCGGTCTCCTAGTCGGAACGGGCCCGGAACATA 
RGANLSGGQRQRiSLARALY - 
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^AGTOACAGGAGCATCTACATCCTGGACGACCCCCTi 

TCAGTGTCCTCGTAGATG^AG^G^GgIgtcTggaaTCTACGGGTACACCCG 
> SDRSlYlLDDPLSALDAHVG - 

TTGGTGTAGAAGTTATcIcGArAVG^^AG^TT^GTTCTGTCAAGACAAACAA 
» NH| FNSA 1 RKHL KS KT» L F„ - 

^AC^CCA^ACAGTACCTGGTTGACTGTGATGAAGTGArCTTCArGAAAGAGGGCTGT 
-+ + + + 2280 

TGGGTGGTCAATGTCATGGACCAACTGACACTACTTCACTAGAAGTACTTTCTCCCGACA 

3 ^"^Qlqylvdcdevifmkegc - 

"+ h + + 2340 

TAATGCCTTTCTCCGTGGGTACTCCTTGACTACTTAAATTTACCACTGATACGATGGTAA 
a 'TERGTHEELMNLNGDYATI 

aaattattggacaacgaccctct + ctgtggcggtcaactctL 

» FNNLLLGETPPVEINSKKET - 

TCACCAAGTGTCTTCTTCAGTGTTCTGTTCCCAGGATT^GTCCTAG 
SGSQKKSQDKGPKTGSVKKE - 

TTTCGTCATTTCGGTCTC^TTCCCGTCGAACACGTCGACCTTCTCTTTCCCGTCCCAAGT 
3 ><A VK PEEGQLVQleekgqgs 
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GTGCCCTGGTCAGTATATGGTGTCTACATCCAGGCTGCTGGGGGCCCCTTGGCATTCCTG 

2521 + * + - + — + 2580 

CACGGGACCAGTCATATACCACAGATGTAGGTCCGACGACCCCCGGGGAACCGTAAGGAC 

VPWSVYGVYIQAAGGPLAFL - 

GTTATTATGGCCCTTTTCATGCTGAATGTAGGCAGCACCGCCTTCAGCACCTGGTGGTTG 
2581 + + + + + + 2640 

CAATAATACCGGGAAAAGTACGACTTACATCCGTCGTGGCGGAAGT.CGTGGACCACCAAC 

VIMALFMLNVG STAFSTWWL - 



AGTTACTGGATCAAGCAAGGAAGCGGGAACACCACTGTGACTCGAGGGAACGAGACCTCG 
264! + + + + + + 2700 

TCAATGACCTAGTTCGTTCCTTCGCCCTTGTGGTGACACTGAGCTCCCTTGCTCTGGAGC 
a SYWIKQGSGNTTVTRGNETS - 

GTGAGTGACAGCATGAAGGACAATCCTCATATGCAGTACTATGCCAGCATCTACGCCCTC 
2701 + + + + + + 2760 

CACTCACTGTCGTACTTCCTGTTAGGAGTATACGTCATGATACGGTCGTAGATGCGGGAG 
a VSOSMKDNr'HMQYYASIYAL - 



TCCATGGCAGTCATGCTGATCCTGAAAGCCATTCGAGGAGTTGTCTTTGTCAAGGGCACG 
2761 + + + + + + 282Q 

AG GTACCGTC AGTACG ACTAG G ACTTTCGGTAAG CTCCTC AAC AG AAACAGTTCCCGTG C 
SMAVMLILKAIRGVVFVKGT - 

CTGCGAGCTTCCTCCCGGCTGCATGACGAGCTTTTCCGAAGGATCCTTCGAAGCCCTATG 
2821 + + + + + + 288Q 

GACGCTCGAAGGAGGGCCGACGTACTGCTCGAAAAGGCTTCCTAGGAAGCTTCGGGATAC 



a LRASSRLHDELFRRILR 



S P M - 



AAGTTTTTTGACACGACCCCCACAGGGAGGATTCTCAACAGGTTTTCCAAAGACATGGAT 
2881 + + + + + + 294Q 

TTCAAAAAACTGTGCTGGGGGTGTCCCTCCTAAGAGTTGTCCAAAAGGTTTCTGTACCTA 

KFFDTTPTGRILNRFSKDMD - 

GAAGTTGACGTGCGGCTGCCGTTCCAGGCCGAGATGTTCATCCAGAACGTTATCCTGGTG 
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2941 - 



3000 



CTTCAACTGCACGCCGACGGCAAGGTCCGGCTCTACAAGTAGGTCTTGCAATAGGACCAC 
a EVDVRLPFQAEMFIQNVILV 

TTCTTCTGTGTGGGAATGATCGCAGGAGTCTTCCCGTGGTTCCTTGTGGCAGTGGGGCCC 



3001 



3060 



AAGAAGACACACCCTTACTAGCGTCCTCAGAAGGGCACCAAGG AACACCG'l CACCCCGGG 
FFCVGMIAGVFPWFLVAVGP - 



CTTG TC ATC CTCTTTT 



3061 - 



"CAGTCCTG CAC ATTGTCTCC AG G G TCCTG ATTCG G G AG CTG AAG 



3120 



GAACAGTAGGAGAAAAGTCAGGACGTGTAACAGAGGTCCCAGGACTAAGCCCTCGACTTC 
a LVILFSVLHIVSRVLIRELK - 

CGTCTGGACAATATCACGCAGTCACCTTTCCTCTCCCACATCACGTCCAGCATACAGGGC 



3121 - 



3180 



GCAGACCTGTTATAGTGCGTCAGTGGAAAGGAGAGGGTGTAGTGCAGGTCGTATGTCCCG 
a RLDNITQSPFLSHITSSIQG - 

CTTGCCACCATCCACGCCTACAATAAAGGGCAGGAGTTTCTGCACAGATACCAGGAGCTG 



3181 - 



3240 



GAACGGTGGTAGGTGCGGATGTTATTTCCCGTCCTCAAAGACGTGTCTATGGTCCTCGAC 
a LATIHAYNKGQEFLHRYQEL - 



324 



CTGGATGACAACCAAGCTCCT 1 1 I II I 1 1 GTTTACGTGTGCG ATGCGGTGGCTGGCTGTG 

+ + + + + 3300 

GACCTACTGTTGGTTCGAGGAAAAAAAAACAAATGCACACGCTACGCCACCGACCGACAC 



LDDNQAPFFLFTCAMRW 



LAV 



CGGCTGGACCTCATCAGCATCGCCCTCATCACCACCACGGGGCTGATGATCGTTCTTATG 
3301 + + + + + + 3360 

GCCGACCTGGAGTAGTCGTAGCGGGAGTAGTGGTGGTGCCCCGACTACTAGCAAGAATAC 
a RLDLISIALITTTGLMIVLM - 

CACGGGCAGATTCCCCCAGCCTATGCGGGTCTCGCCATCTCTTATGCTGTCCAGTTAACG 



3361 - 



3420 
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GTGCCCGTCTAAGGGGGTCGGATACGCCCAGAGCGGTAGAGAATACGACAGGTCAATTGC 
a HGQIPPAYAGLAISYAVQLT - 

GGGCTGTTCCAGTTTACGGTCAGACTGGCATCTGAGACAGAAGCTCGATTCACCTCGGTG 

3421 < — - + 3480 

CCCGACAAGGTCAAATGCCAGTCTGACCGTAGACTCTGTCTTCGAGCTAAGTGGAGCCAC 

GLFQFTVRLASETEAflFTSV - 

GAGAGGATCAATCACTACATTAAGACTCTGTCCTTGGAAGCACCTGCCAGAATTAAGAAC 
3481 + + _ + + + + 3540 

CTCTCCTAGTTAGTGATGTAATTCTGAGACAGGAACCTTCGTGGACGGTCTTAATTCTTG 

a ERINHYIKTLSLEAPARIKN - 

AAGGCTCCCTCCCCTGACTGGCCCCAGGAGGGAGAGGTGACCTTTGAGAACGCAGAGATG 
3541 + + + + + + 3600 

TTCCGAGGGAGGGGACTGACCGGGGTCCTCCCTCTCCACTGGAAACTCTTGCGTCTCTAC 

a KAPSPDWPQEGEV T.F E N A E M - 

AGGTACCGAGAAAACCTCCCTCTTGTCCTAAAGAAAGTATCCTTCACGATCAAACCTAAA 
3601 + + + + + + 3660 

TCCATGGCTCTTTTGGAGGGAGAACAGGATTTCTTTCATAGGAAGTGCTAGTTTGGATTT 
a RYRENLPLVLKKVSFTIKPK - 

GAGAAGATTGGCATTGTGGGGCGGACAGGATCAGGGAAGTCCTCGCTGGGGATGGCCCTC 
3661 + + + + + + 3720 

CTCTTCTAACCGTAACACCCCGCCTGTCCTAGTCCCTTCAGGAGCGACCCCTACCGGGAG 
' EKIGIVGRTGSGKSSLGMAL - 

372 ^f^ c !!!L GGAG ™ CTGGAGGCTGCATCAAGA ^ GATGGAGTGAGA 
+ + + — + — + — + 3780 

AAGGCAGACCACCTCAATAGACCTCCGACGTAGTTCTAACTACCTCACTCTTAGTCACTA 
FRLVELSGGCIKIDGVRISD - 

ATTGGCCTTGCCGACCTCCGAAGCAAACTCTCTATCATTCCTCAAGAGCCGGTGCTGTTC 
3781 + + + V + + 3g4o 

TAACCGGAACGGCTGGAGGCTTCGTTTGAGAGATAGTAAGGAGTTCTCGGCCACGACAAG 
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a IGLADLRSKLSIIPQEPVLF - 

^^AGTGGCACTGTCAGATCAAATTTGGACCCCTTCAACCAGTACACTGAAGACCAGATTTGG 
+ + + + + + 3900 

TCACCGTGACAGTCTAGTTTAAACCTGGGGAAGTTGGTCATGTGACTTCTGGTCTAAACC 
a SGTV RSMLDPFNQYTEDQIW- 

GATGCCCTGGAGAGGACACACATGAAAGAATGTATTGCTCAGCTACeTCTGAAACTTGAA 
3901 + + + + + + 3g6o 

CTACGGGACCTCTCCTGTGTGTACTTTCTTACATAACGAGTCGATGGAGACTTTGAACTT 

» DALERTHMKECIAGLPLKLE - 

TCTGAAGTGATGGAGAATGGGGATAACTTCTCAGTGGGGGAACGGCAGCTCTTGTGCATA 
3961 + + + + + + 4o2() 

AGACTTCACTACCTCTTACCCCTATTGAAGAGTCACCCCCTTGCCGTCGAGAACACGTAT 

SEVMENGDNFSVGERQLLCI - 

GCTAGAGCCCTGCTCCGCCACTGTAAGATTCTGATTTTAGATGAAGCCACAGCTGCCATG 
• + + + + + + 4080 

CGATCTCGGGACGAGGCGGTGACATTCTAAGACTAAAATCTACTTCGGTGTCGACGGTAG 
ARALLRHCKILILDEATAAM - 

GACACAGAGACAGACTTATTGATTCAAGAGACCATCCGAGAAGCATTTGCAGACTGTACC 
+ + + + + + 4140 

CTG TGTCTCTG TCTG AATAACT A AG TTCTCTG GTAG G CTCTTCG TAAACG TCTGACATGG 
DTETDLLIQETIREAFADCT - 

4,47^r!^ + G ^ T ^ CTGCACACGG ^ CTAG ^ 

TACGACTGGTAACGGGTAGCGGACGTGTGCCAAGATCCGAGGCTATCCTAATACCACGAC 
MLTIAHRLHTVLGSORIMVL - 

^^GC^GGGACAGGTGGTGGAGrrTGACACCCCATCGGTCCTTCTGTCCAACGACAGTTCC 
+ + " + " + ■ + + 4260 

CGGGTCCCTGTCCACCACCTCAAACTGTGGGGTAGCCAGGAAGACAGGTTGCTGTCAAGG 
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AQGQVVEFDTPSVLLSNDSS - 

CGATTCTATGCCATGTTTGCTGCTGCAGAGAACAAGGTCGCTGTCAAGGGCTGA 
GCTAAGATACGGTACAAACGACGACGTCTCTTGTTCCAGCGACAGTTCCCGACT 
RFYAMFAAAF.NKVAVKG ' - 
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MOAT D cDNA AND AMINO ACID SEQUENCE ENCODED THEREBY 
^TGGACGCCCTGTGCGGTTCCGGGGAGCTCGGCTCCAAGTTCTGGGACTCCAACCTGTCT 
TACCTGCGGGACACGCCAAGGCCCCTCGAGCCGAGGTTCAAGACCCTGAGGTTGGACAGA 
a MDALCGSGELGSKFWDSNLS - 

GTGCACACAGAAAACCCGGACCTCACTCCCTGCTTCCAGAACTCCCTGCTGGCCTGGGTG 
61 + + + + _ + + 12Q 

CACGTGTGTCTTTTGGGCCTGGAGTGAGGGACGAAGGTCTTGAGGGACGACCGGACCCAC 

a VHTENPDLTPCFQNSLLAWV - 

CCCTGCATCTACCTGTGGGTCGCCCTGCCCTGCTACTTGCTCTACCTGCGGCACCATTGT 
121 + + + + + + 18Q 

GGGACGTAGATGGACACCCAGCGGGACGGGACGATGAACGAGATGGACGCCGTGGTAACA 

a PCIYLWVALPCYLLYLRHHC - 

CG TG G CTACATC ATCCTCTCCC ACCTGTCC AAG CTC AAG ATGGTCCTG G G TG TCCTG CTG 
181 + + + + + + 240 

GCACCGATGTAGTAGGAGAGGGTGGACAGGTTCGAGTTCTACCAGGACCCACAGGACGAC 
a RGYIILSHLSKLKMVLGVLL - 

TGGTGCGTCTCCTGGGCGGACCTTTTTTACTCCTTCCATGGCCTGGTCCATGGCCGGGCC 
241 + + + + + + 300 

ACCACGCAGAGGACCCGCCTGGAAAAAATGAGGAAGGTACCGGACCAGGTACCGGCCCGG 

a WCVSWADLFYSFHGLVHGRA - 

CCTGCCCCTGTTTTCTTTGTCACCCCCTTGGTGGTGGGGGTCACCATGCTGCTGGCCACC 
301 + + + + + + 360 

GGACGGGGACAAAAGAAACAGTGGGGGAACCACCACCCCCAGTGGTACGACGACCGGTGG 

PAPVFFVTPLVVGVTMLLAT - 

CTGCTGATACAGTATGAGCGGCTGCAGGGCGTACAGTCTTCGGGGGTCCTCATTATCTTC 
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GACGACTATGTCATACTCGCCGACGTCCCGCATGTCAGAAGCCCCCAGGAGTAATAGAAG 
a LLIQYERLQGVQSSGVLIIF - 

TGGTTCCTGTGTGTGGTCTGCGCCATCGTCCCATTCCGCTCCAAGATCCTTTTAGCCAAG 
421 + + + _.. + + + 480 

ACCAAGGACACACACCAG ACGCGG TAGCAGGGTAAGGCGAGGTTCTAGGAAAATCGGTTC 

a WFLCVVCAIVP'FRSKILLAK - 

GCAGAGGGTGAGATCTCAGACCCCTTCCGCTTCACCACCTTCTACATCCACTTTGCCCTG 
CGTCTCCCACTCTAGAGTCTGGGGAAGGCGAAGTGGTGGAAGATGTAGGTGAAACGGGAC 

i AEGEISDPFRFTTFYIHFAL - 

G TACTCTCTG CCCTCATCTTG G CCTG CTTC AG GG AG AAACCTCCATTTTTCTCCGC A AAG 
541 + + + + + + 600 

CATGAGAGACGGGAGTAGAACCGGACGAAGTCCCTCTTTGGAGGTAAAAAGAGGCGTTTC 
' VLSALILACFREKPPFFSAK - 

AATGTCGACCCTAACCCCTACCCTGAGACCAGCGCTGGCTTTCTCTCCCGCCTGTTTTTC 
601 + + + + + + 660 

TTACAGCTGGGATTGGGGATGGGACTCTGGTCGCGACCGAAAGAGAGGGCGGACAAAAAG 
NVDPNPYPETSAGFLSRLFF - 

TGGTGGTTCACAAAGATGGCCATCTATGGCTACCGGCATCCCCTGGAGGAGAAGGACCTC 
661 + + + + + + 720 

ACCACCAAGTGTTTCTACCGGTAGATACCGATGGCCGTAGGGGACCTCCTCTTCCTGGAG 
WWFTKMAIYGYRHPLEEKDL - 

TGGTCCCTAAAGGAAGAGGACAGATCCCAGATGGTGGTGCAGCAGCTGCTGGAGGCATGG 
721 + + + + + + 780 

ACCAGGGATTTCCTTCTCCTGTCTAGGGTCTACCACCACGTCGTCGACGACCTCCGTACC 
WSLKEEDRSQMVVQQLLEAW - 

AGGAAGCAGGAAAAGCAGACGGCACGACACAAGGCTTCAGCAGCACCTGGGAAAAATGCC 
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TCCTTCGTCCTTTTCGTCTGCCGTGCTGTGTTCCGAAGTCGTCGTGGACCCTTTTTACGG 
RKQEKQTARHKASAAPGKNA - 

TCCGGCGAGGACGAGGTGCTGCTGGGTGCCCGGCCCAGGCCCCGGAAGCCCTCCTTCCTG 

41 — + ~- + + + + + 900 

AGGCCGCTCCTGCTCCACGACGACCCACGGGCCGGGTCCGGGGCCTTCGGGAGGAAGGAC 



SGEDEVLLGARPRPRK 



P S F L 



AAGGCCCTGCTGG CCACCTTCG G CTCCAGCTTCCTCATC AGTG CCTG CTTCAAGCTTATC 
TTCCGGGACGACCGGTGGAAGCCGAGGTCGAAGGAGTAGTCACGGACGAAGTTCGAATAG 
KALLATFGSSFLISACFKLI - 

CAGGACCTGCTCTCCTTCATCAATCCACAGCTGCTCAGCATCCTGATCAGGTTTATCTCC 
GTCCTGGACGAGAGGAAGTAGTTAGGTGTCGACGAGTCGTAGGACTAGTCCAAATAGAGG 
QDLLSFINPQLLSILIRFIS - 

AACCCCATGGCCCCCTCCTGGTGGGGCTTCCTGGTGGCTGGGCTGATGTTCCTGTGCTCC 



1021 - 



1080 



TTGGGGTACCGGGGGAGGACCACCCCGAAGGACCACCGACCCGACTACAAGGACACGAGG 
NPMAPSWWGFLVAGLMFLCS - 

ATG ATGCAGTCG CTG ATCTTAC AAC ACTATTACC ACT AC ATCTTTG TG ACTGG G GTG AAG 



1081 - 



1140 



TACTACGTCAGCGACTAGAATGTTGTGATAATGGTGATGTAGAAACACTGACCCCACTTC 
MMQSLILQHYYHYIFVTGVK - 

TTTCGTACTG G G ATC ATG G G TG TC ATCT AC AG G AAG G CTCTG G TTATC ACC AACTC AG TC 



1141 - 



-+ 1200 



AAAGCATGACCCTAGTACCCACAGTAGATGTCCTTCCGAGACCAATAGTGGTTGAGTCAG 
FRTGIMGVIYRKALVITNSV - 

AAACGTGCGTCCACTGTG GG G G A AATTGTCAACCTC ATGTC AGTG G ATG CCCAGCG CTTC 
1201 + + + + + + 126Q 

TTTGCACGCAGGTGACACCCCCTTTAACAGTTGGAGTACAGTCACCTACGGGTCGCGAAG 
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a KRASTVGEIVNLMSVD/ 



O R F 



ATGGACCTTGCCCCCTTCCTCAATCTGCTGTGGTCAGCACCCCTGCAGATCATCCTGGCG 



1261 - 



1320 



TACCTGGAACGGGGGAAGGAGTTAGACGACACCAGTCGTGGGGACGTCTAGTAGGACCGC 
a MDLAPFLNLLWSAPLQIILA 

ATCTACTTCCTCTGGCAGAACCTAGGTCCCTCTGTCCTGGCTGGAGTCGCTTTCATGGTC 
TAGATGAAGGAGACCGTC-TGGATCCAGGGAGACAGGACCGACCTCAGCGAAAGTACCAG 
a IYFLWQNLGPSVLAGVAFMV- 

TTGCTGATTCCACTCAACGGAGCTGTGGCCGTGAAGATGCGCGCCTTCCAGGTAAAGCAA 
AACGACTAAGGTGAGTTGCCTCGACACCGGCACTTCTACGCGCGGAAGGTCCATTTCGTT 
a LLIPLNGAVAVKMRAFQVKQ - 

ATGAAATTGAAGGACTCGCGCATCAAGCTGATGAGTGAGATCCTGAACGGCATCAAGGTG 
1441 + - < + + + + 1500 

TACTTTAACTTCCTGAGCGCGTAGTTCGACTACTCACTCTAGGACTTGCCGTAGTTCCAC 
a MKLKDSRIKLMSEILNGIKV - 

CTGAAGCTGTACGCCTGGGAGCCCAGCTTCCTGAAGCAGGTGGAGGGCATCCGGCAGGGT 
1501 + + + + + + 1560 

GACTTCGACATGCGGACCCTCGGGTCGAAGGACTTCGTCCACCTCCCGTAGGCCGTCCCA 
a LKLYAWEPSFLKQVEGIRQG - 

G AG CTCCAG CTG CTG CG C ACGG CGGCCTACCTCC AC ACC AC AACC ACCTTC ACCTG G ATG 
1561 + + + + + + 1620 

CTCGAGGTCGACGACGCGTGCCGCCGGATGGAGGTGTGGTGTTGGTGGAAGTGGACCTAC 
a ELQLLRTAAYLHTTTTFTWM - 

i62 1 j GCAGCCCCTTCCTGGT GACCCTGATCACCCTCTGGGTGTACGTGTACGTGGACCCAAAC 
ACGTCGGGGAAGGACCACTGGGACTAGTGGGAGACCCACATGCACATGCACCTGGGTTTG 



Figure 14D 

SUBSTITUTE SHEET (RULE 26) 



7/15/2008, EAST Version: 2.2.1.0 



PCT7US99/06644 



J CSPFLVTLITLWVYVYVDPN- 

AATGTGCTGGACGCCGAGAAGGCCTTTGTGTCTGTGTCCTTGTTTAATATCTTAAGACTT 

1681 + + + + + + 1740 

TTACACGACCTGCGGCTCTTCCGGAAACACAGACACAGGAACAAATTATAGAATTCTGAA 

i NVLDAEKAFVSVSLFNILRL - 

CCCCTCAACATGCTGCCCCAGTTAATCAGCAACCTGACTCAGGCCAGTGTGTCTCTGAAA 
1741 + + + + + + 1800 

GGGGAGTTGTACGACGGGGTCAATTAGTCGTTGGACTGAGTCCGGTCACACAGAGACTTT 
i PLNMLPQLISNLTQASVSLK- 

CGGATCCAGCAATTCCTGAGCCAAGAGGAACTTGACCCCCAGAGTGTGGAAAGAAAGACC 
1801 + + + + + + 18 60 

GCCTAGGTCGTTAAGGACTCGGTTCTCCTTGAACTGGGGGTCTCACACCTTTCTTTCTGG 
RIQQFLSQEELDPQSVERKT - 

ATCTCCCCAG GCTATG CCATC ACCATACACAGTGG C ACCTTC ACCTGG G CCC AGG ACCTG 
1861 + + + + + + 1920 

TAGAGGGGTCCGATACGGTAGTGGTATGTGTCACCGTGGAAGTGGACCCGGGTCCTGGAC 
ISPGYAITIHSGTFTWAQDL - 

CCCCCCACTCTGCACAGCCTAGACATCCAGGTCCCGAAAGGGGCACTGGTGGCCGTGGTG 
21 + + + + + + 1980 

GGGGGGTGAGACGTGTCGGATCTGTAGGTCCAGGGCTTTCCCCGTGACCACCGGCACCAC 
PPTLHSLDIQVPKGALVAVV - 

GGGCCTGTGGGCTGTGGGAAGTCCTCCCTGGTGTCTGCCCTGCTGGGAGAGATGGAGAAG 
1981 + + + + + + 2040 

CCCGGACACCCGACACCCTTCAGGAGGGACCACAGACGGGACGACCCTCTCTACCTCTTC 
GPVGCGKSSLVSALLGEMEK - 

CTAGAAGGCAAAGTGCACATGAAGGCATGGATCCAGAACTGCACTCTTCAGGAAAACGTG 
GATCTTCCGTTTCACGTGTACTTCCGTACCTAGGTCTTGACGTGAGAAGTCCTTTTGCAC 
LEGKVHMKAWIQNCTLQENV - 



1921 



2041 
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CrnTCGGCAAAGCCCTGAACCCCAAGCGCTACCAGCAGACTCTGGAGGCCTGTGCCTTG 

2101 + + + + + + 2160 

GAAAAGCCGTTTCGGGACTTGGGGTTCGCGATGGTCGTCTGAGACCTCCGGACACGGAAC 

LFGKALNPKRYQQTLEACAL - 
^^CTAGCTGACCTGGAGATGCTGCCTGGTGGGGATCAGACAGAGATTGGAGAGAAGGGCATT 
GATCGACTGGACCTCTACGACGGACCACCCCTAGTCTGTCTCTAACCTCTCTTCCCGTAA 
LADLEMLPGGDQTEIGEKGI 

AACCTGTCTGGGGGCCAGCGGCAGCGGGTCAGTCTGGCTCGAGCTGTTTACAGTGATGCC 



2221 + +. 



_ + + + 2280 



TTGGACAGACCCCCGGTCGCCGTCGCCCAGTCAGACCGAGCTCGACAAATGTCACTACGG 
a NLSGGQRQRVSLARAVYSDA - 

G ATATTTTCTTGCTG G ATG ACCC ACTGTCCG CG GTG G ACTCTC ATGTG G CC AAG C A CATC 
2281 + + + + + + 2340 

CTATAAAAGAACGACCTACTGGGTGACAGGCGCCACCTGAGAGTACACCGGTTCGTGTAG 

a DtFLLDDPLSAVDSHVAKHI - 

TTTGACCACGTCATCGGGCCAGAAGGCGTGCTGGCAGGCAAGACGCGAGTGCTGGTGACG 
2341 + + + + + + 24()0 

AAACTGGTGCAGTAGCCCGGTCTTCCGCACGACCGTCCGTTCTGCGCTCACGACCACTGC 
a FDHVIGPEGVLAGKTRVLVT - 

CACGGCATTAGCTTCCTGCCCCAGACAGACTTCATCATTGTGCTAGCTGATGGACAGGTG 
2401 + + + + + + 2460 

GTGCCGTAATCGAAGGACGGGGTCTGTCTGAAGTAGTAACACGATCGACTACCTGTCCAC 
a HGISFLPQTDFIIVLADGQV - 

TCTGAGATGGGCCCGTACCCAGCCCTGCTGCAGCGCAACGGCTCCTTTGCCAACTTTCTC 
2461 + + + + + + 2520 

AGACTCTACCCGGGCATGGGTCGGGACGACGTCGCGTTGCCGAGGAAACGGTTGAAAGAG 

a SEMGPYPALLQRNGSFANFL- 
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TGCAACTATGCCCCCGATGAGGACCAAGGGCACCTGGAGGACAGCTGGACCGCGTTGGAA 

2521 + + + + + + 2580 

ACGTTGATACGGGGGCTACTCCTGGTTCCCGTGGACCTCCTGTCGACCTGGCGCAACCTT 

a CNYAPDEDQGHLEDSWTALE- 

GGTGCAGAGGATAAGGAGGCACTGCTGATTGAAGACACACTCAGCAACCACACGGATCTG 

2581 " " + + + + + + 2640 

CCACGTCTCCTATTCCTCCGTGACGACTAACTTCTGTGTGAGTCGTTGGTGTGCCTAGAC 

a GAEDKEALLIEDTLSNHTDL - 

ACAGACAATGATCCAGTCACCTATGTGGTCCAGAAGCAGTTTATGAGACAGCTGAGTGCC 
2641 + + — + • — + + + 2700 

TGTCTGTTACTAGGTCAGTGGATACACCAGGTCTTCGTCAAATACTCTGTCGACTCACGG 
a TONDPVTYVVQKQFMRQLSA- 

CTGTCCTCAGATGGGGAGGGACAGGGTCGGCCTGTACCCCGGAGGCACCTGGGTCCATCA 
2701 + + + + + + 2760 

GACAGGAGTCTACCCCTCCCTGTCCCAGCCGGACATGGGGCCTCCGTGGACCCAGGTAGT 

a LSSDGEGQGRPVPRRHLGPS - 

GAGAAGGTGCAGGTGACAGAGGCGAAGGCAGATGGGGCACTGACCCAGGAGGAGAAAGCA 

2761 + + + + + + 2820 

CTCTTCCACGTCCACTGTCTCCGCTTCCGTCTACCCCGTGACTGGGTCCTCCTCTTTCGT 

a EKVQVTEAKADGALTQEEKA- 

GCC ATTG G CACTGTG GAG CTC AG TGTG TTCTG G G ATTATG CC AAG GCCGTGGGG CTCTGT 

2821 + + + + + + 2880 

CGGTAACCGTGACACCTCGAGTCACACAAGACCCTAATACGGTTCCGGCACCCCGAGACA 

a AIGTVELSVFWDYAKAVGLC - 

ACCACGCTGGCCATCTGTCTCCTGTATGTGGGTCAAAGTGCGGCTGCCATTGGAGCCAAT 
TGGTGCGACCGGTAGACAGAGGACATACACCCAGTTTCACGCCGACGGTAACCTCGGTTA 

a TTLAICLLYVGQSAAAIGAN 

GTGTGGCTCAGTGCCTGGACAAATGATGCCATGGCAGACAGTAGACAGAACAACACTTCC 
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2941 - + + + + + + 3000 

CACACCGAGTCACGGACCTGTTTACTACGGTACCGTCTGTCATCTGTCTTGTTGTGAAGG 

a VWLSAWTNDAMADSRQNNTS 

CTGAGGCTGGGCGTCTATGCTGCTTTAGGAATTCTGCAAGGGTTCTTGGTGATGCTGGCA 

3001 + - + + + + + 3060 

GACTCCGACCCGCAGATACGACGAAATCCTTAAGACGTTCCCAAGAACCACTACG ACCGT 

a LRLGVYAALGILQGFLVMLA ■ 

GCCATGGCCATGGCAGCGGGTGGCATCCAGGCTGCCCGTGTGTTGCACCAGGCACTGCTG 
CGGTACCGGTACCGTCGCCCACCGTAGGTCCGACGGGCACACAACGTGGTCCGTGACGAC 

a AMAMAAGGIQAARVLHQALL- 

CACAACAAGATACGCTCGCCACAGTCCTTCTTTGACACCACACCATCAGGCCGCATCCTG 

3121 + + + + + + 3180 

GTGTTGTTCTATGCGAGCGGTGTCAGGAAGAAACTGTGGTGTGGTAGTCCGGCGTAGGAC 

a HNKIRSPQSFFDTTPSGRIL- 

AACTGCTTCTCCAAGGACATCTATGTCGTTGATGAGGTTCTGGCCCCTGTCATCCTCATG 

3181 + + + + + + 3240 

TTGACGAAGAGGTTCCTGTAGATACAGCAACTACTCCAAGACCGGGGACAGTAGGAGTAC 

a NCFSKDIYVVDEVLAPVILM - 

CTGCTCAATTCCTTCTTCAACGCCATCTCCACTCTTGTGGTCATCATGGCCAGCACGCCG 

3241 + + + + + + 3300 

GACGAGTTAAGGAAGAAGTTGCGGTAGAGGTGAGAACACCAGTAGTACCGGTCGTGCGGC 

a LLNSFFNAISTLVVIMASTP - 

CTCTTCACTGTGGTCATCCTGCCCCTGGCTGTGCTCTACACCTTAGTGCAGCGCTTCTAT 

3301 + + + + + — — + 3360 

GAGAAGTGACACCAGTAGGACGGGGACCGACACGAGATGTGGAATCACGTCGCGAAGATA 

a LFTVVILPLAVLYTLVQRFY - 

GCAGCCACATCACGGCAACTGAAGCGGCTGGAATCAGTCAGCCGCTCACCTATCTACTCC 
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CGTCGGTGTAGTGCCGTTGACTTCGCCGACCTTAGTCAGTCGGCGAGTGGATAGATGAGG 
a AATSRQLKRLESVSRSPIYS - 

CACTTTTCGGAGACAGTGACTGGTGCCAGTGTCATCCGGGCCTACAACCGCAGCCGGGAT 
3421 "—- - - + — + + --- + ......... 4 3480 

GTGAAAAGCCTCTGTCACTGACCACGGTCACAGTAGGCCCGGATGTTGGCGTCGGCCCTA 
a HFStTVTGASVIRAYNRSRD- 

TTTGAGATCATCAGTGATACTAAGGTGGATGCCAACCAGAGAAGCTGCTACCCCTACATC 



3481 + + . 



3540 



AAACTCTAGTAGTCACTATGATTCCACCTACGGTTGGTCTCTTCGACGATGGGGATGTAG 
a FEIISDTKVDANQRSCYPYI - 

ATCTCCAACCGGTGGCTGAGCATCGGAGTGGAGTTCGTGGGGAACTGCGTGGTGCTCTTT 
3541 + + + + + + 360() 

TAGAGGTTGGCCACCGACTCGTAGCCTCACCTCAAGCACCCCTTGACGCACCACGAGAAA 

a 'SNRWLSIGVEFVGNCVVLF- 

GCTGCACTATTTGCCGTCATCGGGAGGAGCAGCCTGAACCCGGGGCTGGTGGGCCTTTCT 

3601 + + + + :+ + 3660 

CGACGTGATAAACGGCAGTAGCCCTCCTCGTCGGACTTGGGCCCCGACCACCCGGAAAGA 
a AALFAVIGRSSLNPGLVGLS - 

GTGTCCTACTCCTTGCAGGTGACATTTGCTCTGAACTGGATGATACGAATGATGTCAGAT 
3661 + + + + + + 372Q 

CACAGGATGAGGAACGTCCACTGTAAACGAGACTTGACCTACTATGCTTACTACAGTCTA 
a VSYSLQVTFALNWMIRMMSD- 

TTGGAATCTAACATCGTGGCTGTGGAGAGGGTCAAGGAGTACTCCAAGACAGAGACAGAG 
3721 + + + + + + 3?80 

AACCTTAGATTGTAGCACCGACACCTCTCCCAGTTCCTCATGAGGTTCTGTCTCTGTCTC 
a LESNIVAVERVKEYSKTETE - 

GCGCCCTGGGTGGTGGAAGGCAGCCGCCCTCCCGAAGGTTGGCCCCCACGTGGGGAGGTG 

3781 + + + + — : + _ — + 3840 

CGCGGGACCCACCACCTTCCGTCGGCGGGAGGGCTTCCAACCGGGGGTGCACCCCTCCAC 
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a APWVVEGSRPPEGWPPRGEV - 

GAGTTCCGGAATTATTCTGTGCGCTACCGGCCGGGCCTAGACCTGGTGCTGAGAGACCTG 
3841 - + + - + + — + - + 3900 

CTCAAGGCCTTAATAAGACACGCGATGGCCGGCCCGGATCTGGACCACGACTCTCTGGAC 
a EFRNYSVRYRPGLDLVLRDL- 

AGTCTGCATGTGCACGGTGGCGAGAAGGTGGGGATCGTGGGCCGCAGTGGGGCTGGCAAG 
3901 + + + + + + 396O 

TCAGACGTACACGTGCCACCGCTCTTCCACCCCTAGCACCCGGCGTGACCCCGACCGTTC 

a SLHVHGGEKVGIVGRTGAGK - 

TCTTCCATGACCCTTTGCCTGTTCCGCATCCTGGAGGCGGCAAAGGGTGAAATCCGCATT 
AGAAGGTACTGGGAAACGGACAAGGCGTAGGACCTCCGCCGTTTCCCACTTTAGGCGTAA 

a SSMTICLFRILEAAKGEIRI - 

GATGGCCTCAATGTGGCAGACATCGGCCTCCATGACCTGCGCTCTCAGCTGACCATCATC 
CTACCGGAGTTACACCGTCTGTAGCCGGAGGTACTGGACGCGAGAGTCGACTGGTAGTAG 

a OGLNVAOIGLHDLRSQLTII - 

CCGCAGGACCCCATCCTGTTCTCGGGGACCCTGCGCATGAACCTGGACCCCTTCGGCAGC 
4081 + + + + -+ + 4140 

GGCGTCCTGGGGTAGGACAAGAGCCCCTGGGACGCGTACTTGGACCTGGGGAAGCCGTCG 

a PQDPILFSGTLRMNLDPFGS- 

TACTC AG AG GAG G AC ATTTG G TGG G CTTTG GAGCTGTCCCACCTG C ACACG TTTG TG AG C 

4141 + + + +. + + 4200 

ATG AGTCTCCTCCTGT AAACCACCCG AAACCTCG AC AGGGTG G ACGTGTGC AAAC ACTCG 

a YSEEDIWWALELSHLHTFVS - 

TCCCAGCCGGCAGGCCTGGACTTCCAGTGCTCAGAGGGCGGGGAGAATCTCAGCGTGGGC 
4201 + + + + + + 426O 

AGGGTCGGCCGTCCGGACCTGAAGGTCACGAGTCTCCCGCCCCTCTTAGAGTCGCACCCG 
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a SQPAGLDFQCSEGGENLSVG - 

CAGAGGCAGCTCGTGTGCCTGGCCCGAGCCCTGCTCCGCAAGAGCCGCATCCTGGTTTTA 
GTCTCCGTCGAGCACACGGACCGGGCTCGGGACGAGGCGTTCTCGGCGTAGGACCAAAAT 

a QRQLVCLARALLRKSRILVL - 

GACGAGGCCACAGCTGCCATCGACCTGGAGACTGACAACCTCATCGAGGCTACCATCCGC 

4321 + -+ + + + + 4380 

CTGCTCCGGTGTCGACGGTAGCTGGACCTCTGACTGTTGGAGTAGGTCCGATGGTAGGCG 

a DEATAAIOLETONLIQATIR - 

ACCCAGTTTGATACCTGCACTGTCCTGACCATCGCACACCGGCTTAACACTATCATGGAC 

4381 + + + + + + 4440 

TGGGTCAAACTATGGACGTGACAGGACTGGTAGCGTGTGGCCGAATTGTGATAGTACCTG 

a TQFDTCTVLTIAHRLNTIMD- 

TACACCAGGGTCCTGGTCCTGGACAAAGGAGTAGTAGCTGAATTTGATTCTCCAGCCAAC 

4441 + + + + + + 4500 

ATGTGGTCCCAGGACCAGGACCTGTTTCCTCATCATCGACTTAAACTAAGAGGTCGGTTG 

a YTRVLVLDKGVVAEFDSPAN - 

CTC ATTG CAG CT AG AG G C ATCTTCT ACG G G ATG G CC AG A G ATG CTG G ACTTG CCT AA 

4501 + + + + + 4557 

GAGTAACGTCGATCTCCGTAGAAGATGCCCTACCGGTCTCTACGACCTGAACGGATT 

a LIAARGIFYGMARDAGLA * - 
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MOAT E cONA AND AMINO ACID SEQUENCE ENCODED THEREBY 
^ATGGCCGCGCCTGCTGAGCCCTGCGCGGGGCAGGGGGTCTGGAACCAGACAGAGCCTGAA 
TACCGGCGCGGACGACTCGGGACGCGCCCCGTCCCCCAGACCTTGGTCTGTCTCGGACTT 
a MAAPAEPCAGQGVWNQTEPE - 

CCTGCCGCCACCAGCCTGCTGAGCCTGTGCTTCCTGAGAACAGCAGGGGTCTGGGTACCC 
61 + -- + -~ + + - + + 120 

GGACGGCGGTGGTCGGACGACTCGGACACGAAGGACTCTTGTCGTCCCCAGACCCATGGG 
a PAATSLLSLCFLRTAGVWVP - 

^CCCATGTACCTCTGGGTCCTTGGTCCCATCTACCTCCTCTTCATCCACCACCATGGCCGG 
GGGTACATGGAGACCCAGGAACCAGGGTAGATGGAGGAGAAGTAGGTGGTGGTACCGGCC 
a PMYLWVLGPIYLLF1HHKGR - 

GGCTACCTCCGGATGTCCCCACTCTTCAAAGCCAAGATGGTGCTTGGATTCGCCCTCATA 
181 + + + + + + 240 

CCGATGGAGGCCTACAGGGGTGAGAAGTTTCGGTTCTACCACGAACCTAAGCGGGAGTAT 

a GYLRMSPLFKAKMVLGFALI - 

GTCCTGTGTACCTCCAGCGTGGCTGTCGCTCTTTGGAAAATCCAACAGGGAACGCCTGAG 
24! + + + + + + 3Qo 

CAGGACACATGGAGGTCGCACCGACAGCGAGAAACCTTTTAGGTTGTCCCTTGCGGACTC 

a VLCTSSVAVALWKIQQGTPE- 

G CCCC AG AATTCCTCATTC ATCCT ACTGTG TG G CTC ACC ACG ATG AG CTTCG C AGTG TTC 
+ + +- + + + 3 60 

CGGGGTCTTAAGGAGTAAGTAGGATGACACACCGAGTGGTGCTACTCGAAGCGTCACAAG 
a APEFLIHPTVWLTTMSFAVF - 

CTGATTCACACCGAGAGGAAAAAGGGAGTCCAGTCATCTGGAGTGCTGTTTGGTTACTGG 
361 + + + + + + 

GACTAAGTGTGGCTCTCCTTTTTCCCTCAGGTCAGTAGACCTCACGACAAACCAATGACC 
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LIHTERKKGVQSSGVLFGYW - 

gaagagacgaaacagaacggtcgatgg'ttgcgIcgggtcgtccggaggcctcgcccgaag 
3 llcfvlpatnaaqqasgagf - 

48 c l GA f?* CCCTGTCCGCCA ^^ 

--+ + + + 540 

gtctcgctgggacaggcggtggacaggtggatggatacggacagagaccaccaccgtgtc 

QSDPVRHLSTYLCLSLVVAQ - 

^T^GC-r^CTGCCTGGCGGATCAACCCCCCrrCTTCCCTGAAGACCCCCAGCAGTCT 
+ + + + + 600 

AAACACGACAGGACGGACCGCCTAGTTGGGGGGAAGAAGGGACTTCTGGGGGTCGTCAGA 

a fvlscladqppffpeopqqs - 

TTGGGGACAGGTCTCTGACCCCGTCGGAAG^GAGG^GGTGCAAGACCACCCA^ 
a NPCPETGAAFPSKATFWWVS - 

66 GG^ GG TCTGGAGGGGATACAGGAGGCCACTGAGACCAAAAGACCTCTGGTCGCTT G GG 
+ + + + + + 720 

CCGGACCAGACCTCCCCTATGTCCTCCGGTGACTCTGGTTTTCTGGAGACCAGCGAACCC 



a GLVWRGYRRPLRPKDL 



W S L G 



2 ^'^^^^^~^^'^^*AAG AACTTG TTTCCCG G CTTG A AA AGG AGTG G ATG AG G AACCG C 



TCTCTrTTGAGGAGTCrrCTTGAACAAAGGGCCGAAa 



780 

ACTTTTCCTC ACCTACTCCTTG G CG 
a RENSSEELVSRLEKEWMRNR - 



8 AG^AGCCCGGAGGCACAACAAGGCAATAGCATTTAAAAGGAAAGGCGGCAGT( 
+ ~ + + + + + 840 

TCACGTCGGGCCTCCGTGTTGTTCCGTTATCGTAAATTTTCCTTTCCGCCGTCACCG 
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a SAARRHNKAIAFKRKGGSGM - 

84r-- GCTccAGAGAccGAGccc ^ ccTAcGGcAAGAAGGGAGccAGTGGcGc 

TTCCGAGGTCTCTGGCTCGGGAAGGATGCCGTTCTTCCCTCGGTCACCGCGGGTGACGAC 
a KAPETEPFLRQEGSQWRPLL - 

AAGGCCATCTGGCAGGTGTTCCATTCTACCTTCCTCCTGGGGACCCTCAGCCTCATCATC 
901 + + + + + + 96Q 

TTCCGGTAG ACCGTCC ACAAGGTAAG ATG G AAG G AG G ACCCCTG GGAGTCGGAG TAGTAG 
a KAIWQVFHSTFLLGTLSLII - 

^AGTGATGTCTTCAGGTTCACTGTCCCCAAGCTGCTCAGCCTTTTCCTGGAGTTTATTGGT 
TCACTACAGAAGTCCAAGTGACAGGGGTTCGACGAGTCGGAAAAGGACCTCAAATAACCA 
a SDVFRFTVPKLLSLFLEFIG - 

GATCCCAAGCCTCCAGCCTGGAAGGGCTACCTCCTCGCCGTGCTGATGTTCCTCTCAGCC 
1021 + + + + + + 1080 

CTAGGGTTCGGAGGTCGGACCTTCCCGATGGAGGAGCGGCACGACTACAAGGAGAGTCGG 
a DPKPPAWKGYLIAVLMFLSA - 

TGCCTGCAAACGCTGTTTGAGCAGCAGAACATGTACAGGCTCAAGGTGCCGCAGATGAGG 
1081 + + + + + + 114Q 

ACGGACGTTTGCGACAAACTCGTCGTCTTGTACATGTCCGAGTTCCACGGCGTCTACTCC 
a CLQTLFEQQNMYRLKVPQMR - 

TTGCGGTCGGCCATCACTGGCCTGGTGTACAGAAAGGTCCTGGCTCTGTCCAGCGGCTCC 
1141 + + + + + + 1200 

AACGCCAGCCGGTAGTGACCGGACCACATGTCTTTCCAGGACCGAGACAGGTCGCCGAGG 

a LRSAITGLVYRKVLALSSGS - 

AGAAAGGCCAGTGCGGTGGGTGATGTGGTCAATCTGGTGTCCGTGGACGTGCAGCGGCTG 
1201 + + + + + + 1260 

TCTTTCCGGTCACGCCACCCACTACACCAGTTAGACCACAGGCACCTGCACGTCGCCGAC 
a RKASAVGDVVNLVSVOVQRL - 
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ACCGAGAGCGTCCTCTACCTCAACGGGCTGTGGCTGCCTCTCGTCTGGATCGTGGTCTGC 
1261 + + + + + + 1320 

TGGCTCTCGCAGGAGATGGAGTTGCCCGACACCGACGGAGAGCAGACCTAGCACCAGACG 

a TESVLYLNGLWLPLVWIVVC - 

TTCGTCTATCTCTGGCAGCTCCTGGGGCCCTCCGCCCTCACTGCCATCGCTGTCTTCCTG 
AAGCAGATAGAGACCGTCGAGGACCCCGGGAGGCGGGAGTGACGGTAGCGACAGAAGGAC 

a FVYLWQLLGPSALTAIAVFL 

AGCCTCCTCCCTCTGAATTTCTTCATCTCCAAGAAAAGGAACCACCATCAGGAGGAGCAA 
1381 + + + + + + 1440 

TCGGAGGAGGGAGACTTAAAGAAGTAGAGGTTCTTTTCCTTGGTGGTAGTCCTCCTCGTT 
l SLLPLNFFISKKRNHHQEEQ- 

ATGAGGCAGAAGGACTCACGGGCACGGCTCACCAGCTCTATCCTCAGGAACTCGAAGACC 
1441 + + + _ + + + 150Q 

TACTCCGTCTTCCTGAGTGCCCGTGCCGAGTGGTCGAGATAGGAGTCCTTGAGCTTCTGG 
' MRQKDSRARLTSSILRNSKT - 

ATCAAGTTCCATGGCTGGGAGGGAGCCTTTCTGGACAGAGTCCTGGGCATCCGAGGCCAG 
1501 + + — + + + + 1560 

TAGTTCAAGGTACCGACCCTCCCTCGGAAAGACCTGTCTCAGGACCCGTAGGCTCCGGTC 
(KFHGWEGAFLORVLGIRGQ - 

GAGCTGGGCGCCTTGCGGACCTCCGGCCTCCTCTTCTCTGTGTCGCTGGTGTCCTTCCAA 
1561 + + + + + + 1620 

CTCGACCCGCGGAACGCCTGGAGGCCGGAGGAGAAGAGACACAGCGACCACAGGAAGGTT 
ELGALRTSGLLFSVSLVSFQ - 

GTGTCTACATTTCTGGTCGCACTGGTGGTGTTTGCTGTCCACACTCTGGTGGCCGAGAAT 
1621 + + + + + + 168Q 

CACAGATGTAAAGACCAGCGTGACCACCACAAACGACAGGTGTGAGACCACCGGCTCTTA 
VSTFLVALVVFA. VHTLVAEIM - 
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GCTATG AATGCAGAGAAAGCCTTTGTGACTCTCACAGTTCTCAACATCCTCAACAAGGCC 

1681 + + + + + + 1740 

CGATACTTACGTCTCTTTCGGAAACACTGAGAGTGTCAAGAGTTGTAGGAGTTGTTCCGG 

a AMNAEKAFVTLTVLNILNKA 

CAGGCTTTCCTGCCCTTCTCCATCCACTCCCTCGTCCAGGCCCGGGTGTCCTTTGACCGT 
GTCCGAAAGGACGGGAAGAGGTAGGTGAGGGAGCAGGTCCGGGCCCACAGGAAACTGGCA 

a QAFLPFSIHSLVQARVSFDR- 

CTGGTCACCTTCCTCTGCCTGGAAGAAGTTGACCCTGGTGTCGTAGACTCAAGTTCCTCT 

1801 + + + + + + i860 

GACCAGTGGAAGGAGACGGACCTTCTTCAACTGGGACCACAGCATCTGAGTTCAAGGAGA 

a LVTFLCLEEVDPGVVDSSSS- 

GGAAGCGCTGCCGGGAAGGATTGCATCACCATACACAGTGCCACCTTCGCCTGGTCCCAG 
1861 + + + + + + 1920 

CCTTCGCGACGGCCCTTCCTAACGTAGTGGTATGTGTCACGGTGGAAGCGGACCAGGGTC 

a GSAAGKDCITIHSATFAWSQ - 

G AAAG CCCTCCCTG CCTCC AC AG AAT AAACCTC ACG G TGCCCCAGGG CTGTCTG CTG G CT 

1921 + + + + + + i960 

CTTTCGGGAGGGACGGAGGTGTCTTATTTGGAGTGCCACGGGGTCCCGACAGACGACCGA 

a ESPPCLHRINLTVPQGCLLA - 

GTTGTCGGTCCAGTGGGGGCAGGGAAGTCCTCCCTGCTGTCCGCCCTCCTTGGGGAGCTG 
1981 + + + + + + 2040 

CAACAGCCAGGTCACCCCCGTCCCTTCAGGAGGGACGACAGGCGGGAGGAACCCCTCGAC 

a VVGPVGAGKSSLLSALLGEL- 

TCAAAGGTGGAGGGGTTCGTGAGCATCGAGGGTGCTGTGGCCTACGTGCCCCAGGAGGCC 

2041 + + + + + + 2100 

AGTTTCCACCTCCCCAAGCACTCGTAGCTCCCACGACACCGGATGCACGGGGTCCTCCGG 

a SKVEGFVSIEGAVAYVPQEA - 

TGGGTGCAGAACACCTCTGTGGTAGAGAATGTGTGCTTCGGGCAGGAGCTGGACCCACCC 
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2,01 - + ♦ ♦ 2160 

ACCCACGTCTTGTGGAGACACCATCTCTTACACACGAAGCCCGTCCTCGACCTGGGTGGG 

a WVQNTSVVENVCFGQELDPP - 

TGGCTGGAGAGAGTACTAGAAGCCTGTGCCCTGCAGCCAGATGTGGACAGCTTCCCTGAG 
2161 + + + - + 2220 

ACCGACCTCTCTCATGATCTTCGGACACGGGACGTCGGTCTACACCTG TCGAAGGGACTC 
a WLERVLEACALQPDVDSFPE - 

^^GGWCCACACTTCAATTGGGGAGCAGGGCATGAATCTCTCCGGAGGCCAGAAGCAGCGG 
+ + + + + + 2280 

CCTTAGGTGTGAAGTTAACCCCTCGTCCCGTACTTAGAGAGGCCTCCGGTCTTCGTCGCC 
a GIHTS1GEQGMMLSGGQKQR - 

CTGAGCCTGGCCCGGGCTGTATACAGAAAGGCAGCTGTGTACCTGCTGGATGACCCCCTG 
2281 + _ + + + + . ^ 234() 

GACTCGGACCGGGCCCGACATATGTCTTTCCGTCGACACATGGACGACCTACTGGGGGAC 

a LSLARAVYRKAAVYLLDDPL - 

GCGGCCCTGGATGCCCACGTTGGCCAGCATGTCTTCAACCAGGTCATTGGGCCTGGTGGG 
2341 + + + + + + 24Q0 

CGCCGGGACCTACGGGTGCAACCGGTCGTACAGAAGTTGGTCCAGTAACCCGGACCACCC 

a AALDAHVGQHVFNQVIGPGG - 

^ CTACTCCAGGGAACAACACGGA-rrCTCGTGACGCACGCACTCCACATCCTGCCCCAGGCT 
1 + + + + + + 2460 

GATGAGGTCCCTTGTTGTGCCTAAGAGCACTGCGTGCGTGAGGTGTAGGACGGGGTCCGA 
a LLQGTTRILVTHALHILPQA - 

24 6 TI!L GA I!^ AGTGCTG ^^ 

+ + + + + + 2520 

CTAACCTAGTATCACGACCGTTTACCCCGGTAGCGTCTCTACCCAAGGATGGTCCTCGAA 
AEMGSYQEL - 



DWIIVLANGAI 



CTGCAGAGGAAGGGGGCCCTCGTGTGTCTTCTGGATCAAGCCAGACAGCCAGGAGATAGA 
2521 + + + + + + 25eo 
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GACGTCTCCTTCCCCCGGGAGCACACAGAA6ACETAGTTCGGTCTGTCGGTCCTCTATCT 

a LQRKGALVCLLDOARQPGDR - 

GGAGAAGGAGAAACAGAACCTGGGACCAGeACCAAGGACCCCAGAGGCACCTCTGCAGGC 
CCTCTTCCTCTTTGTCTTGGACCCTGGTCGTGGTTCCTGGGGTCTCCGTGGAGACGTCCG 

a GEGETEPGTSTKDPRGTSAG - 

AGGAGGCCCGAGCTTAGACGCGAGAGGTCCATCAAGTCAGTCCCTGAGAAGGACCGTACC 
TCCTCCGGGCTCGAATCTGCGCTCTCCAGGTAGTTCAGTCAGGGACTCTTCCTGGCATGG 

a RRPELRRERSIKSVPEKDRT - 

ACTTCAGAAGCCCAGACAGAGGTTCCTCTGGATGACCCTGACAGGGCAGGATGGCCAGCA 
2701 + + + + + + 2760 

TGAAGTCTTCGGGTCTGTCTCCAAGGAGACCTACTGGGACTGTCCCGTCCTACCGGTCGT 
a TSEAQTEVPLDOPDRAGWPA - 

GGAAAGGACAGCATCCAATACGGCAGGGTGAAGGCCACAGTGCACCTGGCCTACCTGCGT 
2761 + + + + + + 2820 

CCrrTCCTGTCGTAGGTTATGCCGTCCCACTTCCGGTGTCACGTGGACCGGATGGACGCA 
a GKDSIQYGRVKATVHLAYLR- 

GCCGTGGGCACCCCCCTCTGCCTCTACGCACTCTTCCTCTTCCTCTGCCAGCAAGTGGCC 
2821 + + + + + + 2880 

CGGCACCCGTGGGGGGAGACGGAGATGCGTGAGAAGGAGAAGGAGACGGTCGTTCACCGG 
a AVGTPLCLYALFLFLCQQVA - 

TCCTTCTGCCGGGGCTACTGGCTGAGCCTGTGGGCGGACGACCCTGCAGTAGGTGGGCAG 

2881 + + + + + + 2940 

AGG AAG ACG G CCCCG ATG ACCG ACTCGG AC ACCCGCCTG CTG G G ACGTCATCC ACCCGTC 

a SFCRGYWLSLWADDPAVGGQ - 

CAGACGCAGGCAGCCCTGCGTGGCGGGATCTTCGGGCTCCTCGGCTGTCTCCAAGCCATT 

GTCTGCGTCCGTCGGGACGCACCGCCCTAGAAGCCCGAGGAGCCGACAGAGGTTCGGTAA 
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C L Q A I 



GGGCTGTTTGCCTCCATGGCTGCGGTGCTCCTAGGTGGGGCCCGGGCATCCAGGTTGCTC 
3001 + + + + + + 3060 

CCCGACAAACGGAGGTACCGACGCCACGAGGATCCACCCCGGGCCCGTAGGTCCAACGAG 



GLFASMAAVLLGGA 



R A S R L L 



3067- AGAGGCTCCTGTGGGATGTGGTGCGATCTCCCATCAGC ^ CmGAGCGGAC 

AAGGTCTCCGAGGACACCCTACACCACGCTAGAGGGTAGTCGAAGAAACTCGCCTGTGGG 
a FQRLLWDVVRSP1SFFERTP - 

ATTGGTCACCTGCTAAACCGCTTCTCCAAGGAGACAGACACGGTTGACGTGGACATTCCA 
3121 + + + + + + 3180 

TAACCAGTGGACGATTTGGCGAAGAGGTTCCTCTGTCTGTGCCAACTGCACCTGTAAGGT 

a IGHLLNRFSKETOTVDVDIP - 

GACAAACTCCGGTCCCTGCTGATGTACGCCTTTGGACTCCTGGAGGTCAGCCTGGTGGTG 
3181 + + + ___ + + + 3240 

CTGTTTGAGGCCAGGGACGACTACATGCGGAAACCTGAGGACCTCCAGTCGGACCACCAC 
a DKLRSLLMYAFGLLEVSLVV - 

GCAGTGGCTACCCCACTGGCCACTGTGGCCATCCTGCCACTGTTTCTCCTCTACGCTGGG 
3241 + + + + L+ + 3300 

CGTCACCGATGGGGTGACCGGTGACACCGGTAGGACGGTGACAAAGAGGAGATGCGACCC 

a AVATPLATVAILPLFLLYAG - 

TTTCAGAGCCTGTATGTGGTTAGCTCATGCCAGCTGAGACGCTTGGAGTCAGCCAGCTAC 
3301 + + + + + + 3360 

AAAGTCTCGGACATACACCAATCGAGTACGGTCGACTCTGCGAACCTCAGTCGGTCGATG 
a FQSLYVVSSCQLRRLESASY - 

TCGTCTGTCTGCTCCCACATGGCTGAGACGTTCCAGGGCAGCACAGTGGTCCGGGCATTC 



3361 



+ + _ + + + 3420 

AGCAGACAGACGAGGGTGTACCGACTCTGCAAGGTCCCGTCGTGTCACCAGGCCCGTAAG 
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a SSVCSHMAETFQGSTVVRAF - 

CGAACCCAGGCCCCTCTTGTGGCTCAGAACAATGCTCGCGTAGATGAAAGCCAGAGGATC 

3421 + - + + + -— + + 3480 

GCTTGGGTCCGGGGAGAACACCGAGTCTTGTTACGAGCGCATCTACTTTCGGTCTCCTAG 

a RTQAPLVAQNNARVDESQRI - 

AGTTTCCCGCGACTGGTGGCTGACAGGTGGCTTGCGGCCAATGTGGAGCTCCTGGGGAAT 

3481 + + + + + + 3 5 4o 

TCAAAGGGCGCTGACCACCGACTGTCCACCGAACGCCGGTTACACCTCGAGGACCCCTTA 

a SFPRLVADRWLAANVELLGN - 

GGCCTGGTGTTTGCAGCTGCCACGTGTGCTGTGCTGAGCAAAGCCCACCTCAGTGCTGGC 

3541 + + + + + + 3600 

CCGGACCACAAACGTCGACGGTGCACACGACACGACTCGTTTCGGGTGGAGTCACGACCG 

a GLVFAAATCAVLSKAHLSAG - 

CTCGTGGGCTTCTCTGTCTCTGCTGCCCTCCAGGTGACCCAGGCACTGCAGTGGGTTGTT 

3601 + + + + + + 3660 

GAGCACCCGAAGAGACAGAGACGACGGGAGGTCCACTGGGTCCGTGACGTCACCCAACAA 

a LVGFSVSAALQVTQALQWVV- 

CGCAACTG G AC AG ACCT AG AG AACAGC ATCGTGTC AGTG G AG CG G ATG C AG G ACTATG CC 

3661 + + + + + + 3720 

GCGTTGACCTGTCTGGATCTCTTGTCGTAGCACAGTCACCTCGCCTACGTCCTGATACGG 

a RNWTOLENSIVSVERMQDYA - 

TGGACGCCCAAGGAGGCTCCCTGGAGGCTGCCCACATGTGCAGCTCAGCCCCCCTGGCCT 

3721 + + + + + + 3780 

ACCTGCGGGTTCCTCCGAGGGACCTCCGACGGGTGTACACGTCGAGTCGGGGGGACCGGA 

a WTPKEAPWRLPTCAAQPPWP- 

CAGGGCGGGCAGATCGAGTTCCGGGACTTTGGGCTAAGATACCGACCTGAGCTCCCGCTG 

3781 + + + + - + + 3840 

GTCCCGCCCGTCT AG CTC AAGQCCCTG AAACCCG ATTCTATG G CTG G ACTCG AG G GCG AC 

a QGGQIEFRDFGLRYRPELPL- 
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GCTGTGCAGGGCGTGTCCCTCAAGATCCACGCAGGAGAGAAGGTGGGCATCGTTGGCAGG 



3841 -- 



3900 



CGACACGTCCCGCACAGGGAGTTCTAGGTGCGTCCTCTCTTCCACCCGTAGCAACCGTCC 
AVQGVSLKIHAGEKVGIVGR - 

ACCGGGGCAGGGAAGTCCTCCCTGGCCAGTGGGCTGCTGCGGCTCCAGGAGGCAGCTGAG 



3901 -• 



3960 



TGGCCCCGTCCCTTCAGG AG G G ACCGGTCACCCG ACG ACG CCG AG GTCCTCCGTCG ACTC 
a TGAGKSSLASGLLRLQEAAE - 

GGTGGGATCTGGATCGACGGGGTCCCCATTGCCCACGTGGGGCTGCACACACTGCGCTCC 



3961 - 



4020 



CCACCCTAGACCTAGCTGCCCCAGGGGTAACGGGTGCACCCCGACGTGTGTGACGCGAGG 
GGIWIDGVPIAHVGLHTLRS - 

AGGATCAGCATCATCCCCCAGGACCCCATCCTGTTCCCTGGCTCTCTGCGGATGAACCTC 



4021 - 



-+ 4080 



TCCTAGTCGTAGTAGGGGGTCCTGGGGTAGGACAAGGGACCGAGAGACGCCTACTTGGAG 

a risiipqopilfpgslrm.nl ■ 

GACCTGCTGCAGGAGCACTCGGACGAGGCTATCTGGGCAGCCCTGGAGACGGTGCAGCTC 



4081 - 



-+ 4140 



CTGGACGACGTCCTCGTGAGCCTGCTCCGATAGACCCGTCGGGACCTCTGCCACGTCGAG 
a DLLQEHSOEAIWAALETVQL- 

AAAGCCTTGGTGGCCAGCCTGCCCGGCCAGCTGCAGTACAAGTGTGCTGACCGAGGCGAG 



4141 - 



4200 



T7TCGGAACCACCGGTCGGACGGGCCGGTCGACGTCATGTTCACACGACTGGCTCCGCTC 
a KALVASLPGQLQYKCADRGE - 

GACCTGAGCGTGGGCCAGAAACAGCTCCTGTGTCTGGCACGTGCCCTTCTCCGGAAGACC 
4201 + + + + + + 4260 

CTGGACTCGCACCCGGTCTTTGTCGAGGACACAGACCGTGCACGGGAAGAGGCCTTCTGG 
a DLSVGQKQLLCI. ARALLRKT - 
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CAGATCCTCATCCTGGACGAGGCTACTGCTGCCGTGGACCCTGGCACGGAGCTGCAGATG 
4261 + + + + + + 4320 

GTCTAGGAGTAGGACCTGCTCCGATGACGACGGCACCTGGGACCGTGCCTCGACGTCTAC 

a QILILDEATAAVDPGTELQM - 

CAGGCCATGCTCGGGAGCTGGTTTGCACAGTGCACTGTGCTGCTCATTGCCCACCGCCTG 
GTCCGGTACGAGCCCTCGACCAAACGTGTCACGTGACACGACGAGTAACGGGTGGCGGAC 

a QAMLGSWFAQCTVLLIAHRL 

CGCTCCGTGATGGACTGTGCCCGGGTTCTGGTCATGGACAAGGGGCAGGTGGCAGAGAGC 
4381 + + + + + + 4440 

GCGAGGCACTACCTGACACGGGCCCAAGACCAGTACCTGTTCCCCGTCCACCGTCTCTCG 
a RSVMDCARVLVMDKGQVAES - 

GGCAGCCCGGCCCAGCTGCTGGCCCAGAAGGGCCTGTTTTACAGACTGGCCCAGGAGTCA 
4441 +- + + + + + 4500 

CCGTCGGGCCGGGTCGACGACCGGGTCTTCCCGGACAAAATGTCTGACCGGGTCCTCAGT 

a GSPAQLLAQKGuFYRLAQES - 

GGCCTGGTCTGA 

4501 +- 4512 

CCGGACCAGACT 

a G L V • - 
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SEQUENCE LISTING 

<110> Fox Chase Cancer Center 
Kruh, Gary D. 
Lee, Kun 

Belinsky, Martin G. 
Bain, Lisa J. 

<120> MRP-Related ABC Transporter Encoding 
Nucleic Acids and Methods of Use Thereof 



<130> FCCC 98-02 

<150> 60/079,759 
<151> 1998-03-27 

<150> 60/095,153 
<151> 1998-08-03 

<160> 18 

<170> FastSEQ for Windows Version 3.0 

<210> 1 

<211> 4231 

<212> DNA 

<213> Homo sapiens 

<400> 1 

ggacaggcgt ggcggccgga gccccagcat ccctgcttga ggtccaggag cggagcccgc 60 

ggccaccgcc gcctgatcag cgcgaccccg gcccgcgccc gccccgcccg gcaagatgct 120 

gcccgtgtac caggaggtga agcccaaccc gctgcaggac gcgaacatct gctcacgcgt 180 

gttcttctgg tggctcaatc ccttgtttaa aattggccat aaacggagat tagaggaaga 240 

tgatatgtat tcagtgctgc cagaagaccg ctcacagcac cttggagagg agttgcaagg 300 

gttctgggat aaagaagttt taagagctga gaatgacgca cagaagcctt ctttaacaag 360 

agcaatcata aagtgttact ggaaatctta tttagttttg ggaattttta cgttaattga 420 

ggaaagtgcc aaagtaatcc agcccatatt tttgggaaaa attattaatt attttgaaaa 480 

ttatgatccc atggattctg tggctttgaa cacagcgtac gcctatgcca cggtgctgac 540 

tttttgcacg ctcattttgg ctatactgca tcacttatat ttttatcacg ttcagtgtgc 500 

tgggatgagg ttacgagtag ccatgtgcca tatgatttat cggaaggcac ttcgtcttag 560 

taacatggcc atggggaaga caaccacagg ccagatagtc aatctgctgt ccaatgatgt 720 

gaacaagttt gatcaggtga cagtgttctt acacttcctg tgggcaggac cactgcaggc 780 

gatcgcagtg actgccctac tctggatgga gataggaata tcgtgccttg ctgggatggc 840 

agttctaatc attctcctgc ccttgcaaag ctgttttggg aagttgttct catcactgag 900 

gagtaaaact gcaactttca cggatgccag gatcaggacc atgaatgaag ttataactgg 9 60 

tataaggata ataaaaatgt acgcctggga aaagtcattt tcaaatctta ttaccaattt 1020 

gagaaagaag gagatttcca agattctgag aagttcctgc ctcaggggga tgaatttggc 1080 

ttcgtttttc agtgcaagca aaatcatcgt gtttgtgacc ttcaccacct acgtgctcct 1140 

cggcagtgtg atcacagcca gccgcgtgtt cgtggcagtg acgctgtatg gggctgtgcg 1200 

gctgacggtt accctcttct tcccctcagc cattgagagg gtgtcagagg caatcgtcag 1260 

catccgaaga atccagacct ttttgctact tgatgagata tcacagcgca accgtcagct 1320 

gccgtcagat ggtaaaaaga tggtgcatgt gcaggatttt actgcttttt gggataaggc 1380 

atcagagacc ccaactctac aaggcctttc ctttactgtc agacctggcg aattgttagc 1440 

tgtggtcggc cccgtgggag cagggaagtc atcactgtta agtgccgtgc tcggggaatt 150 0 

ggccccaagt cacgggctgg tcagcgtgca tggaagaatt gcctatgtgt ctcagcagcc 1560 

ctgggtgttc tcgggaactc tgaggagtaa tattttattt gggaagaaat atgaaaagga 1620 

acgatatgaa aaagtcataa aggcttgtgc tctgaaaaag gatttacagc tgttggagga 1680 

tggtgatctg actgtgatag gagatcgggg aaccacgctg agtggagggc agaaagcacg 1740 

ggtaaacctt gcaagagcag tgtatcaaga tgctgacatc tatctcctgg acgatcctct 180 0 

cagtgcagta gatgcggaag ttagcagaca cttgttcgaa ctgtgtattt gtcaaatttt 1860 

gcatgagaag atcacaattt tagtgactca tcagttgcag tacctcaaag ctgcaagtca 1920 

gattctgata ttgaaagatg gtaaaatggt gcagaagggg acttacactg agttcctaaa 1980 

atctggtata gattttggct cccttttaaa gaaggataat gaggaaagtg aacaacctcc 2040 

agttccagga actcccacac taaggaatcg taccttctca gagtcttcgg tttggtctca 2100 
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acaatcttct agaccctcct tgaaagatgg tgctctggag agccaagata cagagaatgt 2160 

cccagttaca ctatcagagg agaaccgttc tgaaggaaaa gttggttttc aggcctataa 2220 

gaattacttc agagctggtg ctcactggat tgtcttcatt ttccttattc tcctaaacac 2280 

tgcagctcag gttgcctatg tgcttcaaga ttggtggctt tcatactggg caaacaaaca 2340 

aagtatgcta aatgtcactg taaatggagg aggaaatgta accgagaagc tagatcttaa 2400 

ctggtactta ggaatttatt caggtttaac tgtagctacc gttctttttg gcatagcaag 2460 

atctctattg gtattctacg tccttgttaa ctcttcacaa actttgcaca acaaaatgtt 2520 

tgagtcaatt ctgaaagctc cggtattatt ctttgataga aatccaatag gaagaatttt 2580 

aaatcgtttc tccaaagaca ttggacactt ggatgatttg ctgccgctga cgtttttaga 2640 

tttcatccag acattgctac aagtggttgg tgtggtctct gtggctgtgg ccgtgattcc 2700 

ttggatcgca atacccttgg ttccccttgg aatcattttc atttttcttc ggcgatattt 2760 

tttggaaacg tcaagagatg tgaagcgcct ggaatctaca actcggagtc cagtgttttc 2820 

ccacttgtca tcttctctcc aggggctctg gaccatccgg gcatacaaag cagaagagag 2880 

gtgtcaggaa ctgtttgatg cacaccagga tttacattca gaggcttggt tcttgttttt 2940 

gacaacgtcc cgctggttcg ccgtccgtct ggatgccatc tgtgccatgt ttgtcatcat 3000 

cgttgccttt gggtccctga ttctggcaaa aactctggat gccgggcagg ttggtttggc 3 0 60 

actgtcctat gccctcacgc tcatggggat gtttcagtgg tgtgttcgac aaagtgctga 3120 

agttgagaat atgatgatct cagtagaaag ggtcattgaa tacacagacc ttgaaaaaga 3180 

agcaccttgg gaatatcaga aacgcccacc accagcctgg ccccatgaag gagtgataat 3240 

ctttgacaat gtgaacttca tgtacagtcc aggtgggcct ctggtactga agcatctgac 3300 

agcactcatt aaatcacaag aaaaggttgg cattgtggga agaaccggag ctggaaaaag 3360 

ttccctcatc tcagcccttt ttagattgtc agaacccgaa ggtaaaattt ggattgataa 3420 

gatcttgaca actgaaattg gacttcacga tttaaggaag aaaatgtcaa tcatacctca 3480 

ggaacctgtt ttgttcactg gaacaatgag gaaaaacctg gatcccttta aggagcacac 3540 

ggatgaggaa ctgtggaatg ccttacaaga ggtacaactt aaagaaacca ttgaagatct 3600 

tcctggtaaa atggatactg aattagcaga atcaggatcc aattttagtg ttggacaaag 3660 

acaactggtg tgccttgcca gggcaattct caggaaaaat cagatattga ttattgatga 3720 

agcgacggca aatgtggatc caagaactga tgagttaata caaaaaaaaa tccgggagaa 3780 

atttgcccac tgcaccgtgc taaccattgc acacagattg aacaccatta ttgacagcga 3840 

caagataatg gttttagatt caggaagact gaaagaatat gatgagccgt atgttttgct 3900 

gcaaaataaa gagagcctat tttacaagat ggtgcaacaa ctgggcaagg cagaagccgc 3960 

tgccctcact gaaacagcaa aacaggtata cttcaaaaga aattatccac atattggtca 4020 

cactgaccac atggttacaa acacttccaa tggacagccc tcgaccttaa ctattttcga 4080 

gacagcactg tgaatocaac caaaatgtca agtccgttcc gaaggcattt tccactagtt 4140 

tttggactat gtaaaccaca ttgtactttt ttttactttg gcaacaaata tttatacata 4200 

caagatgcta gttcatttga atatttctcc c 4231 

<210> 2 

<211> 1325 

<212> PRT 

<213> Homo sapiens 

<400> 2 



Met 


Leu 


Pro 


Val 


Tyr 


Gin 


Glu 


Val 


Lys 


Pro 


Asn 


Pro 


Leu Gin Asp Ala 


1 








5 










10 








15 




He 


Cys 


Ser 




Val 


Phe 


Phe 


Trp 


Trp 






Pro Leu 


Phe Lys 








20 










25 








30 


He 


Gly 


His 






Arg 




Glu 


Glu 


Asp Asp Met 


Tyr Ser 


Val Leu 






35 










40 










45 




Pro 


Glu 


Asp Arg 


Ser 


Gin 


His 




Gly 


Glu 


Glu 


Leu 


Gin Gly 


Phe Trp 




50 










55 










60 


65 P 


Lys 


Glu 


Val 


Leu 


Arg Ala 


Glu 


Asn 




Ala 


Gin 


Lys Pro 


Ser Leu 












70 










75 




80 


Thr 


Arg 


Ala 


He 


He 


Lys 


Cys 


Tyr 


Trp 




Ser 


Tyr 


Leu Val 


Leu Gly 










85 










9C 








95 


He 


Phe 


Thr 




He 


Glu 


Glu 


Ser 


Ala 


Lys 


Val 


He 


Gin Pro 


He Phe 








100 










105 






110 




Leu Gly 


Lys 


He 


He 


Asn 


Tyr 


Phe 


Glu 




Tyr 




Pro Met 


Asp Ser 






115 










120 








125 


Val 


Ala 




Asn 


Thr 


Ala 


Tyr 


Ala 


Tyr 


Ala 


Thr 


Val 


Leu Thr 


Phe Cys 




130 










135 










140 




Thr 




He 




Ala 


He 


Leu 


His 


His 


Leu 


Tyr 


Phe 


Tyr His 


Val Gin 


145 










150 










155 




160 


Cys 


Ala 


Gly Met 


Arg 


Leu Arg 


Val 


Ala 


Met 


Cys 


His 


Met He 


Tyr Arg 










165 










170 








175 


Lys Ala 


Leu 


Arg 


Leu 


Ser 


Asn 


Met 


Ala 


Met 


Gly Lys 


Thr Thr Thr Gly 
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180 185 190 

Gin lie Val Asn Leu Leu Ser Asn Asp Val Asn Lys Phe Asp Gin Val 

195 200 205 

Thr Val Phe Leu His Phe Leu Trp Ala Gly Pro Leu Gin Ala He Ala 

210 215 220 

Val Thr Ala Leu Leu Trp Met Glu He Gly He Ser Cys Leu Ala Gly 
225 230 235 ' 240 

Met Ala Val Leu He He Leu Leu Pro Leu Gin Ser Cys Phe Gly Lys 

245 250 ~ 255 

Leu Phe Ser Ser Leu Arg Ser Lys Thr Ala Thr Phe Thr Asp Ala Arg 

260 265 270 

He Arg Thr Met Asn Glu Val He Thr Gly He Arg He He Lys Met 

275 280 285 

Tyr Ala Trp Glu Lys Ser Phe Ser Asn Leu He Thr Asn Leu Arg Lys 

290 295 300 

Lys Glu He Ser Lys He Leu Arg Ser Ser Cys Leu Arg Gly Met Asn 
305 310 315 320 

Leu Ala Ser Phe Phe Ser Ala Ser Lys He He Val Phe Val Thr Phe 

325 330 335 

Thr Thr Tyr Val Leu Leu Gly Ser Val He Thr Ala Ser Arg Val Phe 

340 345 350 

Val Ala Val Thr Leu Tyr Gly Ala Val Arg Leu Thr Val Thr Leu Phe 

355 360 365 

Phe Pro Ser Ala He Glu Arg Val Ser Glu Ala He Val Ser He Arg 

370 375 380 

Arg He Gin Thr Phe Leu Leu Leu Asp Glu He Ser Gin Arg Asn Arg 
385 390 395 400 

Gin Leu Pro Ser Asp Gly Lys Lys Met Val His Val Gin Asp Phe Thr 

405 410 415 

Ala Phe Trp Asp Lys Ala Ser Glu Thr Pro Thr Leu Gin Gly Leu Ser 

420 425 430 

Phe Thr Val Arg Pro Gly Glu Leu Leu Ala Val Val Gly Pro Val Gly 

435 440 445 

Ala Gly Lys Ser Ser Leu Leu Ser Ala Val Leu Gly Glu Leu Ala Pro 

450 455 460 

Ser His Gly Leu Val Ser Val His Gly Arg He Ala Tyr Val Ser Gin 
465 470 475 480 

Gin Pro Trp Val Phe Ser Gly Thr Leu Arg Ser Asn He Leu Phe Gly 

485 490 495 

Lys Lys Tyr Glu Lys Glu Arg Tyr Glu Lys Val He Lys Ala Cys Ala 

500 505 510 

Leu Lys Lys Asp Leu Gin Leu Leu Glu Asp Gly Asp Leu Thr Val He 

515 520 525 

Gly Asp Arg Gly Thr Pro Leu Ser Gly Gly Gin Lys Ala Arg Val Asn 

530 535 540 

Leu Ala Arg Ala Val Tyr Gin Asp Ala Asp He Tyr Leu Leu Asp Asp 
545 550 555 560 

Pro Leu Ser Ala Val Asp Ala Glu Val Ser Arg His Leu Phe Glu Leu 

565 570 575 

Cys He Cys Gin He Leu His Glu Lys He Thr He Leu Val Thr His 

580 585 590 

Gin Leu Gin Tyr Leu Lys Ala Ala Ser Gin He Leu He Leu Lys Asp 

595 600 605 

Gly Lys Met Val Gin Lys Gly Thr Tyr Thr Glu Phe Leu Lys Ser Gly 

610 615 620 

He Asp Phe Gly Ser Leu Leu Lys Lys Asp Asn Glu Glu Ser Glu Gin 
625 630 635 640 

Pro Pro Val Pro Gly Thr Pro Thr Leu Arg Asn Arg Thr Phe Ser Glu 

645 650 655 

Ser Ser Val Trp Ser Gin Gin Ser Ser Arg Pro Ser Leu Lys Asp Gly 

660 665 670 

Ala Leu Glu Ser Gin Asp Thr Glu Asn Val Pro Val Thr Leu Ser Glu 

675 680 685 

Glu Asn Arg Ser Glu Gly Lys Val Gly Phe Gin Ala Tyr Lys Asn Tyr 

690 695 700 

Phe Arg Ala Gly Ala His Trp He Val Phe He Phe Leu He Leu Leu 
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705 710 715 720 

Asia Thr Ala Ala Gin Val Ala Tyr Val Leu Gin Asp Trp Trp Leu Ser 

725 730 735 

Tyr Trp Ala Asn Lys Gin Ser Met Leu Asn Val Thr Val Asn Gly Gly 

740 745 750 

Gly Asn Val Thr Glu Lys Leu Asp Leu Asn Trp Tyr Leu Gly lie Tyr 

755 760 765 

Ser Gly Leu Thr Val Ala Thr Val Leu Phe Gly He Ala Arg Ser Leu 

770 775 780 

Leu Val Phe Tyr Val Leu Val Asn Ser Ser Gin Thr Leu His Asn Lys 
785 790 795 800 

Met Phe Glu Ser He Leu Lys Ala Pro Val Leu Phe Phe Asp Arg Asn 

805 810 815 

Pro He Gly Arg He Leu Asn Arg Phe Ser Lys Asp He Gly His Leu 

820 825 830 

Asp Asp Leu Leu Pro Leu Thr Phe Leu Asp Phe He Gin Thr Leu Leu 

835 840 845 

Gin Val Val Gly Val Val Ser Val Ala Val Ala Val He Pro Trp He 

850 855 860 

Ala He Pro Leu Val Pro Leu Gly He He Phe He Phe Leu Arg Arg 
865 870 875 880 

Tyr Phe Leu Glu Thr Ser Arg Asp Val Lys Arg Leu Glu Ser Thr Thr 

885 890 895 

Arg Ser Pro Val Phe Ser His Leu Ser Ser Ser Leu Gin Gly Leu Trp 

900 905 910 

Thr He Arg Ala Tyr Lys Ala Glu Glu Arg Cys Gin Glu Leu Phe Asp 

915 920 925 

Ala His Gin Asp Leu His Ser Glu Ala Trp Phe Leu Phe Leu Thr Thr 

930 935 940 

Ser Arg Trp Phe Ala Val Arg Leu Asp Ala He Cys Ala Met Phe Val 
945 950 955 ~ 960 

He He Val Ala Phe Gly Ser Leu He Leu Ala Lys Thr Leu Asp Ala 

965 970 975 

Gly Gin Val Gly Leu Ala Leu Ser Tyr Ala Leu Thr Leu Met Gly Met 

980 985 990 

Phe Gin Trp Cys Val Arg Gin Ser Ala Glu Val Glu Asn Met Met He 

995 1000 1005 

Ser Val Glu Arg Val He Glu Tyr Thr Asp Leu Glu Lys Glu Ala Pro 

1010 1015 1020 

Trp Glu Tyr Gin Lys Arg Pro Pro Pro Ala Trp Pro His Glu Gly Val 
1025 1030 1035 1040 

He He Phe Asp Asn Val Asn Phe Met Tyr Ser Pro Gly Gly Pro Leu 

1045 1050 1055 

Val Leu Lys His Leu Thr Ala Leu He Lys Ser Gin Glu Lys Val Gly 

1060 1065 1070 

He Val Gly Arg Thr Gly Ala Gly Lys Ser Ser Leu He Ser Ala Leu 

1075 1080 1085 

Phe Arg Leu Ser Glu Pro Glu Gly Lys He Trp He Asp Lys He Leu 

1090 1095 1100 

Thr Thr Glu He Gly Leu His Asp Leu Arg Lys Lys Met Ser He He 
H05 1110 1115 1120 

Pro Gin Glu Pro Val Leu Phe Thr Gly Thr Met Arg Lys Asn Leu Asp 

1125 1130 1135 

Pro Phe Lys Glu His Thr Asp Glu Glu Leu Trp Asn Ala Leu Arg Glu 

1140 1145 1150 

Val Gin Leu Lys Glu Thr He Glu Asp Leu Pro Gly Lys Met Asp Thr 

1155 1160 H65 

Glu Leu Ala Glu Ser Gly Ser Asn Phe Ser Val Gly Gin Arg Gin Leu 

1170 H75 1180 

Val Cys Leu Ala Arg Ala He Leu Arg Lys Asn Gin He Leu He He 
1185 1190 H95 1200 

Asp Glu Ala Thr Ala Asn Val Asp Pro Arg Thr Asp Glu Leu He Gin 

1205 1210 1215 

Lys Lys He Arg Glu Lys Phe Ala His Cys Thr Val Leu Thr He Ala 

1220 1225 1230 

His Arg Leu Asn Thr He He Asp Ser Asp Lys He Met Val Leu Asp 
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1235 1240 1245 

Ser Gly Arg Leu Lys Glu Tyr Asp Glu Pro Tyr Val Leu Leu Gin Asn 

1250 1255 1260 

Lys Glu Ser Leu Phe Tyr Lys Met Val Gin Gin Leu Gly Lys Ala Glu 
1265 1270 1275 1280 

Ala Ala Ala Leu Thr Glu Thr Ala Lys Gin Val Tyr Phe Lys Arg Asn 

1285 1290 1295 

Tyr Pro His lie Gly His Thr Asp His Met Val Thr Asn Thr Ser Asn 

1300 1305 1310 

Gly Gin Pro Ser Thr Leu Thr He Phe Glu Thr Ala Leu 
1315 1320 1325 

<210> 3 

<211> 5838 

<212> DNA 

<213> Homo sapiens 

<400> 3 

ccgggcaggt ggctcatgct cgggagcgtg gttgagcggc tggcgcggtt gtcctggagc 60 

aggggcgcag gaattctgat gtgaaactaa cagtctgtga gccctggaac ctccgctcag 120 

agaagatgaa ggatatcgac ataggaaaag agtatatcat ccccagtcct gggtatagaa 180 

gtgtgaggga gagaaccagc acttctggga cgcacagaga ccgtgaagat tccaagttca 240 

ggagaactcg accgttggaa tgccaagatg ccttggaaac agcagcccga gccgagggcc 300 

tctctcttga tgcctccatg cattctcagc tcagaatcct ggatgaggag catcccaagg 360 

gaaagtacca tcatggcttg agtgctctga agcccatccg gactacttcc aaacaccagc 42 0 

acccagtgga caatgctggg cttttttcct gtatgacttt ttcgtggctt tcttctctgg 480 

cccgtgtggc ccacaagaag ggggagctct caatggaaga cgtgtggtct ctgtccaagc 540 

acgagtcttc tgacgtgaac tgcagaagac tagagagact gtggcaagaa gagctgaatg 600 

aagttgggcc agacgctgct tccctgcgaa gggttgtgtg gatcttctgc cgcaccaggc 660 

tcatcctgtc catcgtgtgc ctgatgatca cgcagctggc tggcttcagt ggaccagcct 720 

tcatggtgaa acacctcttg gagtataccc aggcaacaga gtctaacctg cagtacagct 780 

tgttgttagt gctgggcctc ctcctgacgg aaatcgtgcg gtcttggtcg cttgcactga 840 

cttgggcatt gaattaccga accggtgtcc gcttgcgggg ggccatccta accatggcat 900 

ttaagaagat ccttaagtta aagaacatta aagagaaatc cctgggtgag ctcatcaaca 960 

tttgctccaa cgatgggcag agaatgtttg aggcagcagc cgttggcagc ctgctggctg 1020 

gaggacccgt tgttgccatc ttaggcatga tttataatgt aattattctg ggaccaacag 1080 

gcttcctggg atcagctgtt tttatcctct tttacccagc aatgatgttt gcatcacggc 1140 

tcacagcata tttcaggaga aaatgcgtgg ccgccacgga tgaacgtgtc cagaagatga 1200 

atgaagttct tacttacatt aaatttatca aaatgtatgc ctgggtcaaa gcattttctc 1260 

agagtgttca aaaaatccgc gaggaggagc gtcggatatt ggaaaaagcc gggtacttcc 1320 

agggtatcac tgtgggtgtg gctcccattg tggtggtgat tgccagcgtg gtgaccttct 1380 

ctgttcatat gaccctgggc ttcgatctga cagcagcaca ggctttcaca gtggtgacag 1440 

tcttcaattc catgactttt gctttgaaag taacaccgtt ttcagtaaag tccctctcag 1500 

aagcctcagt ggctgttgac agatttaaga gtttgtttct aatggaagag attcacatga 1560 

taaagaacaa accagccagt cctcacatca agatagagat gaaaaatgcc accttggcat 1620 

gggactcctc ccactccagt atccagaact cgcccaagct gacccccaaa atgaaaaaag 1680 

acaagagggc ttccaggggc aagaaagaga aggtgaggca gctgcagcgc actgagcatc 1740 

aggcggtgct ggcagagcag aaaggccacc tcctcctgga cagtgacgag cggcccagtc 1800 

ccgaagagga agaaggcaag cacatccacc tgggccacct gcgcttacag aggacactgc 1860 

acagcatcga tctggagatc caagagggta aactggttgg aatctgcggc agtgtgggaa 1920 

gtggaaaaac ctctctcatt tcagccattt taggccagat gacgcttcta gagggcagca 1980 

ttgcaatcag tggaaccttc gcttatgtgg cccagcaggc ctggatcctc aatgctactc 2040 

tgagagacaa catcctgttt gggaaggaat atgatgaaga aagatacaac tctgtgctga 2100 

acagctgctg cctgaggcct gacctggcca ttcttcccag cagcgacctg acggagattg 2160 

gagagcgagg agccaacctg agcggtgggc agcgccagag gatcagcctt gcccgggcct 2220 

tgtatagtga caggagcatc tacatcctgg acgaccccct cagtgcctta gatgcccatg 2280 

tgggcaacca catcttcaat agtgctatcc ggaaacatct caagtccaag acagttctgt 2340 

ttgttaccca ccagttacag tacctggttg actgtgatga agtgatcttc atgaaagagg 2400 

gctgtattac ggaaagaggc acccatgagg aactgatgaa tttaaatggt gactatgcta 2460 

ccatttttaa taacctgttg ctgggagaga caccgccagt tgagatcaat tcaaaaaagg 252 0 

aaaccagtgg ttcacagaag aagtcacaag acaagggtcc taaaacagga tcagtaaaga 2580 

aggaaaaagc agtaaagcca gaggaagggc agcttgtgca gctggaagag aaagggcagg 2640 

gttcagtgcc ctggtcagta tatggtgtct acatccaggc tgctgggggc cccttggcat 2700 

tcctggttat tatggccctt ttcatgctga atgtaggcag caccgccttc agcacctggt 2760 

ggttgagtta ctggatcaag caaggaagcg ggaacaccac tgtgactcga gggaacgaga 2820 

cctcggtgag tgacagcatg aaggacaatc ctcatatgca gtactatgcc agcatctacg 2880 
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ccctctccat ggcagtcatg ctgatcctga aagccattcg aggagttgtc tttgtcaagg 2940 

gcacgctgcg agcttcctcc cggctgcatg acgagctttt ccgaaggatc cttcgaagcc 3000 

ctatgaagtt ttttgacacg acccccacag ggaggattct caacaggttt tccaaagaca 3060 

tggatgaagt tgacgtgcgg ctgccgttcc aggccgagat gttcatccag aacgttatcc 3120 

tggtgttctt ctgtgtggga atgatcgcag gagtcttccc gtggttcctt gtggcagtgg 3180 

ggccccttgt catcctcttt tcagtcctgc acattgtctc cagggtcctg attcgggagc 3240 

tgaagcgtct ggacaatatc acgcagtcac ctttcctctc ccacatcacg tccagcatac 3300 

agggccttgc caccatccac gcctacaata aagggcagga gtttctgcac agataccagg 33 60 

agctgctgga tgacaaccaa gctccttttt ttttgtttac gtgtgcgatg cggtggctgg 3420 

ctgtgcggct ggacctcatc agcatcgccc tcatcaccac cacggggctg atgatcgttc 3480 

ttatgcacgg gcagattccc ccagcctatg cgggtctcgc catctcttat gctgtccagt 3540 

taacggggct gttccagttt acggtcagac tggcatctga gacagaagct cgattcacct 3 600 

cggtggagag gatcaatcac tacattaaga ctctgtcctt ggaagcacct gccagaatta 3660 

agaacaaggc tccctcccct gactggcccc aggagggaga ggtgaccttt gagaacgcag 3720 

agatgaggta ccgagaaaac ctccctcttg tcctaaagaa agtatccttc acgatcaaac 3780 

ctaaagagaa gattggcatt gtggggcgga caggatcagg gaagtcctcg ctggggatgg 3 840 

ccctcttccg tctggtggag ttatctggag gctgcatcaa gattgatgga gtgagaatca 3900 

gtgatattgg ccttgccgac ctccgaagca aactctctat cattcctcaa gagccggtgc 3960 

tgttcagtgg cactgtcaga tcaaatttgg accccttcaa ccagtacact gaagaccaga 4020 

tttgggatgc cctggagagg acacacatga aagaatgtat tgctcagcta cctctgaaac 4080 

ttgaatctga agtgatggag aatggggata acttctcagt gggggaacgg cagctcttgt 4140 

gcatagctag agccctgctc cgccactgta agattctgat tttagatgaa gccacagctg 4200 

ccatggacac agagacagac ttattgattc aagagaccat ccgagaagca tttgcagact 4260 

gtaccatgct gaccattgcc catcgcctgc acacggttct aggctccgat aggattatgg 4320 

tgctggccca gggacaggtg gtggagtttg acaccccatc ggtccttctg tccaacgaca 4380 

gttcccgatt ctatgccatg tttgctgctg cagagaacaa ggtcgctgtc aagggctgac 4440 

tcctccctgt tgacgaagtc tcttttcttt agagcattgc cattccctgc ctggggcggg 4500 

cccctcatcg cgtcctccta ccgaaacctt gcctttctcg attttatctt tcgcacagca 4560 

gttccggatt ggcttgtgtg tttcactttt agggagagtc atattttgat tattgtattt 4620 

attccatatt catgtaaaca aaatttagtt tttgttctta attgcactct aaaaggttca 4680 

gggaaccgtt attataattg tatcagaggc ctataatgaa gctttatacg tgtagctata 4740 

tctatatata attctgtaca tagcctatat ttacagtgaa aatgtaagct gtttatttta 4800 

tattaaaata agcactgtgc taataacagt gcatattcct ttctatcatt tttgtacagt 4860 

ttgctgtact agagatctgg ttttgctatt agactgtagg aagagtagca tttcattctt 4920 

ctctagctgg tggtttcacg gtgccaggtt ttctgggtgt ccaaaggaag acgtgtggca 4980 

atagtgggcc ctccgacagc cccctctgcc gcctccccac agccgctcca ggggtggctg 5040 

gagacgggtg ggcggctgga gaccatgcag agcgccgtga gttctcaggg ctcctgcctt 5100 

ctgtcctggt gtcacttact gtttctgtca ggagagcagc ggggcgaagc ccaggcccct 5160 

tttcactccc tccatcaaga atggggatca cagagacatt cctccgagcc ggggagtttc 5220 

tttcctgcct tcttcttttt gctgttgttt ctaaacaaga atcagtctat ccacagagag 5280 

tcccactgcc tcaggttcct atggctggcc actgcacaga gctctccagc tccaagacct 5340 

gttggttcca agccctggag ccaactgctg ctttttgagg tggcactttt tcatttgcct 5400 

attcccacac ctccacagtt cagtggcagg gctcaggatt tcgtgggtct gttttccttt 5460 

ctcaccgcag tcgtcgcaca gtctctctct ctctctcccc tcaaagtctg caactttaag 5520 

cagctcttgc taatcagtgt ctcacactgg cgtagaagtt tttgtactgt aaagagacct 5580 

acctcaggtt gctggttgct gtgtggtttg gtgtgttccc gcaaaccccc tttgtgctgt 5640 

ggggctggta gctcaggtgg gcgtggtcac tgctgtcatc agttgaatgg tcagcgttgc 5700 

atgtcgtgac caactagaca ttctgtcgcc ttagcatgtt tgctgaacac cttgtggaag 57 60 

caaaaatctg aaaatgtgaa taaaattatt ttggattttg taaaaaaaaa aaaaaaaaaa 5820 

aaaaaaaaaa aaaaaaaa 5838 



<210> 4 
<211> 1437 
<212> PRT 

<213> Homo sapiens 
<400> 4 

Met Lys Asp He Asp He Gly Lys Glu Tyr He He Pro Ser Pro Gly 

1 5 10 15 

Tyr Arg Ser Val Arg Glu Arg Thr Ser Thr Ser Gly Thr His Arg Asp 

20 25 30 

Arg Glu Asp Ser Lys Phe Arg Arg Thr Arg Pro Leu Glu Cys Gin Asp 

35 40 45 

Ala Leu Glu Thr Ala Ala Arg Ala Glu Gly Leu Ser Leu Asp Ala Ser 
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Met His Ser Gin Leu Arg He Leu Asp Glu Glu His Pro Lys Gly Lys 
65 70 75 80 

Tyr Has His Gly Leu Ser Ala Leu Lys Pro He Arg Thr Thr Ser Lys 

85 90 95 

His Gin His Pro Val Asp Asn Ala Gly Leu Phe Ser Cys Met Thr Phe 

100 105 HO 

Ser Trp Leu Ser Ser Leu Ala Arg Val Ala His Lys Lys Gly Glu Leu 

115 120 125 

Ser Met Glu Asp Val Trp Ser Leu Ser Lys His Glu Ser Ser Asp Val 

130 135 ~ 140 

Asn Cys Arg Arg Leu Glu Arg Leu Trp Gin Glu Glu Leu Asn Glu Val 
145 150 155 160 

Gly Pro Asp Ala Ala Ser Leu Arg Arg Val Val Trp He Phe Cys Arq 

165 170 175 

Thr Arg Leu He Leu Ser He Val Cys Leu Met He Thr Gin Leu Ala 

180 185 190 

Gly Phe Ser Gly Pro Ala Phe Met Val Lys His Leu Leu Glu Tyr Thr 

195 200 205 

Gin Ala Thr Glu Ser Asn Leu Gin Tyr Ser Leu Leu Leu Val Leu Gly 

210 215 220 

Leu Leu Leu Thr Glu He Val Arg Ser Trp Ser Leu Ala Leu Thr Trp 
225 230 235 240 

Ala Leu Asn Tyr Arg Thr Gly Val Arg Leu Arg Gly Ala He Leu Thr 

245 250 255 

Met Ala Phe Lys Lys He Leu Lys Leu Lys Asn He Lys Glu Lys Ser 

260 265 270 

Leu Gly Glu Leu He Asn He Cys Ser Asn Asp Gly Gin Arg Met Phe 

275 280 285 

Glu Ala Ala Ala Val Gly Ser Leu Leu Ala Gly Gly Pro Val Val Ala 

290 295 300 

He Leu Gly Met He Tyr Asn Val He He Leu Gly Pro Thr Gly Phe 
305 310 315 320 

Leu Gly Ser Ala Val Phe He Leu Phe Tyr Pro Ala Met Met Phe Ala 

325 330 335 

Ser Arg Leu Thr Ala Tyr Phe Arg Arg Lys Cys Val Ala Ala Thr Asp 

340 345 350 

Glu Arg Val Gin Lys Met Asn Glu Val Leu Thr Tyr He Lys Phe He 

355 360 365 

Lys Met Tyr Ala Trp Val Lys Ala Phe Ser Gin Ser Val Gin Lys He 

370 375 380 

Arg Glu Glu Glu Arg Arg He Leu Glu Lys Ala Gly Tyr Phe Gin Gly 
385 390 395 400 

He Thr Val Gly Val Ala Pro He Val Val Val He Ala Ser Val Val 

405 410 415 

Thr Phe Ser Val His Met Thr Leu Gly Phe Asp Leu Thr Ala Ala Gin 

420 425 430 

Ala Phe Thr Val Val Thr Val Phe Asn Ser Met Thr Phe Ala Leu Lys 

435 440 445 

Val Thr Pro Phe Ser Val Lys Ser Leu Ser Glu Ala Ser Val Ala Val 

450 455 460 

Asp Arg Phe Lys Ser Leu Phe Leu Met Glu Glu Val His Met He Lys 
465 470 475 480 

Asn Lys Pro Ala Ser Pro His He Lys He Glu Met Lys Asn Ala Thr 

485 490 " 495 

Leu Ala Trp Asp Ser Ser His Ser Ser He Gin Asn Ser Pro Lys Leu 

500 505 510 

Thr Pro Lys Met Lys Lys Asp Lys Arg Ala Ser Arg Gly Lys Lys Glu 

515 520 525 

Lys Val Arg Gin Leu Gin Arg Thr Glu His Gin Ala Val Leu Ala Glu 

530 535 540 

Gin Lys Gly His Leu Leu Leu Asp Ser Asp Glu Arg Pro Ser Pro Glu 
545 550 555 560 

Glu Glu Glu Gly Lys His He His Leu Gly His Leu Arg Leu Gin Arg 

565 570 575 

Thr Leu His Ser He Asp Leu Glu He Gin Glu Gly Lys Leu Val Gly 
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580 585 590 

lie Cys Gly Ser Val Gly Ser Gly Lys Thr Ser Leu He Ser Ala He 

595 600 605 

Leu Gly Gin Met Thr Leu Leu Glu Gly Ser He Ala He Ser Gly Thr 

610 615 620 

Phe Ala Tyr Val Ala Gin Gin Ala Trp He Leu Asn Ala Thr Leu Arg 
625 630 635 640 

Asp Asn He Leu Phe Gly Lys Glu Tyr Asp Glu Glu Arg Tyr Asn Ser 

645 650 655 

Val Leu Asn Ser Cys Cys Leu Arg Pro Asp Leu Ala He Leu Pro Ser 

660 665 670 

Ser Asp Leu Thr Glu He Gly Glu Arg Gly Ala Asn Leu Ser Gly Gly 

675 680 * 685 

Gin Arg Gin Arg He Ser Leu Ala Arg Ala Leu Tyr Ser Asp Arg Ser 

690 695 700 

He Tyr He Leu Asp Asp Pro Leu Ser Ala Leu Asp Ala His Val Gly 
705 710 715 720 

Asn His He Phe Asn Ser Ala He Arg Lys His Leu Lys Ser Lys Thr 

725 730 735 

Val Leu Phe Val Thr His Gin Leu Gin Tyr Leu Val Asp Cys Asp Glu 

740 745 750 

Val He Phe Met Lys Glu Gly Cys He Thr Glu Arg Gly Thr His Glu 

755 760 765 

Glu Leu Met Asn Leu Asn Gly Asp Tyr Ala Thr He Phe Asn Asn Leu 

770 775 780 

Leu Leu Gly Glu Thr Pro Pro Val Glu He Asn Ser Lys Lys Glu Thr 
785 790 795 800 

Ser Gly Ser Gin Lys Lys Ser Gin Asp Lys Gly Pro Lys Thr Gly Ser 

805 810 815 

Val Lys Lys Glu Lys Ala Val Lys Pro Glu Glu Gly Gin Leu Val Gin 

820 825 830 

Leu Glu Glu Lys Gly Gin Gly Ser Val Pro Trp Ser Val Tyr Gly Val 

835 840 845 

Tyr He Gin Ala Ala Gly Gly Pro Leu Ala Phe Leu Val He Met Ala 

850 855 860 

Leu Phe Met Leu Asn Val Gly Ser Thr Ala Phe Ser Thr Trp Trp Leu 
865 870 875 880 

Ser Tyr Trp He Lys Gin Gly Ser Gly Asn Thr Thr Val Thr Arg Gly 

885 890 895 

Asn Glu Thr Ser Val Ser Asp Ser Met Lys Asp Asn Pro His Met Gin 

900 905 910 

Tyr Tyr Ala Ser He Tyr Ala Leu Ser Met Ala Val Met Leu He Leu 

915 920 925 

Lys Ala He Arg Gly Val Val Phe Val Lys Gly Thr Leu Arg Ala Ser 

930 935 940 

Ser Arg Leu His Asp Glu Leu Phe Arg Arg He Leu Arg Ser Pro Met 
945 950 955 960 

Lys Phe Phe Asp Thr Thr Pro Thr Gly Arg He Leu Asn Arg Phe Ser 

965 970 975 

Lys Asp Met Asp Glu Val Asp Val Arg Leu Pro Phe Gin Ala Glu Met 

980 985 990 

Phe He Gin Asn Val He Leu Val Phe Phe Cys Val Gly Met He Ala 

995 1000 1005 

Gly Val Phe Pro Trp Phe Leu Val Ala Val Gly Pro Leu Val He Leu 

1010 1015 1020 

Phe Ser Val Leu His He Val Ser Arg Val Leu He Arg Glu Leu Lys 
1025 1030 1035 1040 

Arg Leu Asp Asn He Thr Gin Ser Pro Phe Leu Ser His He Thr Ser 

1045 1050 1055 

Ser He Gin Gly Leu Ala Thr He His Ala Tyr Asn Lys Gly Gin Glu 

1060 1065 1070 

Phe Leu His Arg Tyr Gin Glu Leu Leu Asp Asp Asn Gin Ala Pro Phe 

1075 1080 1085 

Phe Leu Phe Thr Cys Ala Met Arg Trp Leu Ala Val Arg Leu Asp Leu 

1090 1095 1100 

He Ser He Ala Leu He Thr Thr Thr Gly Leu Met He Val Leu Met 
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1105 1110 1H5 1120 

His Gly Gin lie Pro Pro Ala Tyr Ala Gly Leu Ala lie Ser Tyr Ala 

1125 1130 1135 

Val Gin Leu Thr Gly Leu Phe Gin Phe Thr Val Arg Leu Ala Ser Glu 

1140 1145 1150 

Thr Glu Ala Arg Phe Thr Ser Val Glu Arg He Asn His Tyr He Lys 

1155 H60 1165 

Thr Leu Ser Leu Glu Ala Pro Ala Arg He Lys Asn Lys Ala Pro Ser 

1170 1175 1180 

Pro Asp Trp Pro Gin Glu Gly Glu Val Thr Phe Glu Asn Ala Glu Met 
1185 1190 H95 120 o 

Arg Tyr Arg Glu Asn Leu Pro Leu Val Leu Lys Lys Val Ser Phe Thr 

1205 1210 1215 

He Lys Pro Lys Glu Lys He Gly He Val Gly Arg Thr Gly Ser Gly 

1220 1225 1230 

Lys Ser Ser Leu Gly Met Ala Leu Phe Arg Leu Val Glu Leu Ser Gly 

1235 1240 1245 

Gly Cys He Lys He Asp Gly Val Arg He Ser Asp He Gly Leu Ala 

1250 1255 1260 

Asp Leu Arg Ser Lys Leu Ser He He Pro Gin Glu Pro Val Leu Phe 
1265 1270 1275 1280 

Ser Gly Thr Val Arg Ser Asn Leu Asp Pro Phe Asn Gin Tyr Thr Glu 

1285 1290 1295 

Asp Gin He Trp Asp Ala Leu Glu Arg Thr His Met Lys Glu Cys He 

1300 1305 1310 

Ala Gin Leu Pro Leu Lys Leu Glu Ser Glu Val Met Glu Asn Gly Asp 

1315 1320 1325 

Asn Phe Ser Val Gly Glu Arg Gin Leu Leu Cys He Ala Arg Ala Leu 

1330 1335 1340 

Leu Arg His Cys Lys He Leu He Leu Asp Glu Ala Thr Ala Ala Met 
1345 1350 1355 1360 

Asp Thr Glu Thr Asp Leu Leu He Gin Glu Thr He Arg Glu Ala Phe 

1365 1370 1375 

Ala Asp Cys Thr Met Leu Thr He Ala His Arg Leu His Thr Val Leu 

1380 1385 1390 

Gly Ser Asp Arg lie Met Val Leu Ala Gin Gly Gin Val Val Glu Phe 

1395 1400 1405 

Asp Thr Pro Ser Val Leu Leu Ser Asn Asp Ser Ser Arg Phe Tyr Ala 

1410 1415 1420 

Met Phe Ala Ala Ala Glu Asn Lys Val Ala Val Lys Gly 
1425 1430 1435 

<210> 5 

<211> 5079 

<212> DNA 

<213> Homo sapiens 

<400> 5 

ccccatggac gccctgtgcg gttccgggga gctcggctcc aagttctggg actccaacct 60 

gtctgtgcac acagaaaacc cggacctcac tccctgcttc cagaactccc tgctggcctg 120 

ggtgccctgc atctacctgt gggtcgccct gccctgctac ttgctctacc tgcggcacca 180 

ttgtcgtggc tacatcatcc tctcccacct gtccaagctc aagatggtcc tgggtgtcct 240 

gctgtggtgc gtctcctggg cggacctttt ttactccttc catggcctgg tccatggccg 300 

ggcccctgcc cctgttttct ttgtcacccc cttggtggtg ggggtcacca tgctgctggc 360 

caccctgctg atacagtatg agcggctgca gggcgtacag tcttcggggg tcctcattat 420 

cttctggttc ctgtgtgtgg tctgcgccat cgtcccattc cgctccaaga tccttttagc 480 

caaggcagag ggtgagatct cagacccctt ccgcttcacc accttctaca tccactttgc 540 

cctggtactc tctgccctca tcttggcctg cttcagggag aaacctccat ttttctccgc 600 

aaagaatgtc gaccctaacc cctaccctga gaccagcgct ggctttctct cccgcctgtt 660 

tttctggtgg ttcacaaaga tggccatcta tggctaccgg catcccctgg aggagaagga 720 



- ^ ^ LL^aLaaayo uy y i_uai_c ud. Lggctaccgg cacccccrgg aggagaagga 

cctctggtcc ctaaaggaag aggacagatc ccagatggtg gtgcagcagc tgctggaggc 
atggaggaag caggaaaagc agacggcacg acacaaggct tcagcagcac ctgggaaaaa 
tgcctccggc gaggacgagg tgctgctggg tgcccggccc aggccccgga agccctcctt 
cctgaaggcc ctgctggcca ccttcggctc cagcttcctc atcagtgcct gcttcaagct 
tatcoaggac ctgctctcct tcatcaatcc acagctgctc agcatcctga tcaggtttat 
ctccaacccc atggccccct cctggtgggg cttcctggtg gctgggctga tgttcctgtg 



780 
840 
900 
960 
1020 
1080 
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ctccatgatg cagtcgctga tcttacaaca ctattaccac tacatctttg tgactggggt 1140 
gaagtttcgt actgggatca tgggtgtcat ctacaggaag gctctggtta tcaccaactc 1200 
agtcaaacgt gcgtccactg tgggggaaat tgtcaacctc atgtcagtgg atgcccagcg 12 60 
cttcatggac cttgccccct tcctcaatct gctgtggtca gcacccctgc agatcatcct 1320 
ggcgatctac ttcctctggc agaacctagg tccctctgtc ctggctggag tcgctttcat 1380 
ggtcttgctg attccactca acggagctgt ggccgtgaag atgcgcgcct tccaggtaaa 1440 
gcaaatgaaa ttgaaggact cgcgcatcaa gctgatgagt gagatcctga acggcatcaa 1500 
ggtgctgaag ctgtacgcct gggagcccag cttcctgaag caggtggagg gcatcaggca 1560 
gggtgagctc cagctgctgc gcacggcggc ctacctccac accacaacca ccttcacctg 1620 
gatgtgcagc cccttcctgg tgaccctgat caccctctgg gtgtacgtgt acgtggaccc 1680 
aaacaatgtg ctggacgccg agaaggcctt tgtgtctgtg tccttgttta atatcttaag 1740 
acttcccctc aacatgctgc cccagttaat cagcaacctg actcaggcca gtgtgtctct 1800 
gaaacggatc cagcaattcc tgagccaaga ggaacttgac ccccagagtg tggaaagaaa 1860 
gaccatctcc ccaggctatg ccatcaccat acacagtggc accttcacct gggcccagga 192 0 
cctgcccccc actctgcaca gcctagacat ccaggtcccg aaaggggcac tggtggccgt 1980 
ggtggggcct gtgggctgtg ggaagtcctc cctggtgtct gccctgctgg gagagatgga 2040 
gaagctagaa ggcaaagtgc acatgaaggg ctccgtggcc tatgtgcccc agcaggcatg 210 0 

gatccagaac tgcactcttc aggaaaacgt gcttttcggc aaagccctga accccaagcg 2160 

ctaccagcag actctggagg cctgtgcctt gctagctgac ctggagatgc tgcctggtgg 2220 

ggatcagaca gagattggag agaagggcat taacctgtct gggggccagc ggcagcgggt 2280 

cagtctggct cgagctgttt acagtgatgc cgatattttc ttgctggatg acccactgtc 2340 

cgcggtggac tctcatgtgg ccaagcacat ctttgaccac gtcatcgggc cagaaggcgt 2400 

gctggcaggc aagacgcgag tgctggtgac gcacggcatt agcttcctgc cccagacaga 24 60 

cttcatcatt gtgctagctg atggacaggt gtctgagatg ggcccgtacc cagccctgct 2520 

gcagcgcaac ggctcctttg ccaactttct ctgcaactat gcccccgatg aggaccaagg 2580 

gcacctggag gacagctgga ccgcgttgga aggtgcagag gataaggagg cactgctgat 2640 

tgaagacaca ctcagcaacc acacggatct gacagacaat gatccagtca cctatgtggt 2700 

ccagaagcag tttatgagac agctgagtgc cctgtcctca gatggggagg gacagggtcg 27 60 

gcctgtaccc cggaggcacc tgggtccatc agagaaggtg caggtgacag aggcgaaggc 2820 

agatggggca ctgacccagg aggagaaagc agccattggc actgtggagc tcagtgtgtt 2880 

ctgggattat gccaaggccg tggggctctg taccacgctg gccatctgtc tcctgtatgt 2940 

gggtcaaagt gcggctgcca ttggagccaa tgtgtggctc agtgcctgga caaatgatgc 3000 

catggcagac agtagacaga aoaacacttc cctgaggctg ggcgtctatg ctgctttagg 3060 

aattctgcaa gggttcttgg tgatgctggc agccatggcc atggcagcgg gtggcatcca 3120 

ggctgcccgt gtgttgcacc aggcactgct gcacaacaag atacgctcgc cacagtcctt 3180 

ctttgacacc acaccatcag gccgcatcct gaactgcttc tccaaggaca tctatgtcgt 3240 

tgat aggtt ctggcccctg tcatcctcat gctgctcaat tccttcttca acgccatctc 3300 

cactcttgtg gtcatcatgg ccagcacgcc gctcttcact gtggtcatcc tgcccctggc 3360 

tgtgctctac accttagtgc agcgcttcta tgcagccaca tcacggcaac tgaagcggct 3420 

ggaatcagtc agccgctcac ctatctactc ccacttttcg gagacagtga ctggtgccag 3480 

tgtcatccgg gcctacaacc gcagccggga ttttgagatc atcagtgata ctaaggtgga 3540 

tgccaaccag agaagctgct acccctacat catctccaac cggtggctga gcatcggagt 3600 

ggagttcgtg gggaactgcg tggtgctctt tgctgcacta tttgccgtca tcgggaggag 3 660 

cagcctgaac ccggggctgg tgggcctttc tgtgtcctac tccttgcagg tgacatttgc 3720 

tctgaactgg atgatacgaa tgatgtcaga tttggaatct aacatcgtgg ctgtggagag 3780 

ggtcaaggag tactccaaga cagagacaga ggcgccctgg gtggtggaag gcagccgccc 3840 

tcccgaaggt tggcccccac gtggggaggt ggagttccgg aattattctg tgcgctaccg 3900 

gccgggccta gacctggtgc tgagagacct gagtctgcat gtgcacggtg gcgagaaggt 3960 

ggggatcgtg ggccgcactg gggctggcaa gtcttccatg accctttgcc tgttccgcat 4020 

cctggaggcg gcaaagggtg aaatccgcat tgatggcctc aatgtggcag acatcggcct 4080 

ccatgacctg cgctctcagc tgaccatcat cccgcaggac cccatcctgt tctcggggac 4140 

cctgcgcatg aacctggacc ccttcggcag ctactcagag gaggacattt ggtgggcttt 4200 

ggagctgtcc cacctgcaca cgtttgtgag ctcccagccg gcaggcctgg acttccagtg 4260 

ctcagagggc ggggagaatc tcagcgtggg ccagaggcag ctcgtgtgcc tggcccgagc 432 0 

cctgctccgc aagagccgca tcctggtttt agacgaggcc acagctgcca tcgacctgga 4380 

gactgacaac ctcatccagg ctaccatccg cacccagttt gatacctgca ctgtcctgac 4440 

catcgcacac cggcttaaca ctatcatgga ctacaccagg gtcctggtcc tggacaaagg 4500 

agtagtagct gaatttgatt ctccagccaa cctcattgca gctagaggca tcttctacgg 4560 

gatggccaga gatgctggac ttgcctaaaa tatattcctg agatttcctc ctggcctttc 4620 

ctggttttca tcaggaagga aatgacacca aatatgtccg cagaatggac ttgatagcaa 4680 

acactggggg caccttaaga ttttgcacct gtaaagtgcc ttacagggta actgtgctga 4740 

atgctttaga tgaggaaatg atccccaagt ggtgaatgac acgcctaagg tcacagctag 4800 

tttgagccag ttagactagt ccccggtctc ccgattccca actgagtgtt atttgcacac 4860 

tgcactgttt tcaaataacg attttatgaa atgacctctg tcctccctct gatttttcat 4920 

attttctaaa gtttcgtttc tgttttttaa taaaaagctt tttcctcctg gaacagaaga 4980 

cagctgctgg gtcaggccac ccctaggaac tcagtcctgt actctggggt gctgcctgaa 5040 
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tccattaaaa atgggagtac tgatgaaata aaactacag 



5079 



<210> 6 

<211> 1527 

<212> PRT 

<213> Homo sapiens 

<400> 6 

Met Asp Ala Leu Cys Gly Ser Gly Glu Leu Gly Ser Lys Phe Trp Asp 

15 10 15 

Ser Asn Leu Ser Val His Thr Glu Asn Pro Asp Leu Thr Pro Cys Phe 

20 25 30 

Gin Asn Ser Leu Leu Ala Trp Val Pro Cys lie Tyr Leu Trp Val Ala 

35 40 45 

Leu Pro Cys Tyr Leu Leu Tyr Leu Arg His His Cys Arg Gly Tyr He 

50 55 60 

He Leu Ser His Leu Ser Lys Leu Lys Met Val Leu Gly Val Leu Leu 
65 70 75 ~ 80 

Trp Cys Val Ser Trp Ala Asp Leu Phe Tyr Ser Phe His Gly Leu Val 

85 90 95 

His Gly Arg Ala Pro Ala Pro Val Phe Phe Val Thr Pro Leu Val Val 

100 105 110 

Gly Val Thr Met Leu Leu Ala Thr Leu Leu He Gin Tyr Glu Arg Leu 

115 120 125 

Gin Gly Val Gin Ser Ser Gly Val Leu He He Phe Trp Phe Leu Cys 

130 135 140 

Val Val Cys Ala He Val Pro Phe Arg Ser Lys He Leu Leu Ala Lys 
145 150 155 160 

Ala Glu Gly Glu He Ser Asp Pro Phe Arg Phe Thr Thr Phe Tyr He 

165 170 175 

His Phe Ala Leu Val Leu Ser Ala Leu He Leu Ala Cys Phe Arg Glu 

180 185 190 

Lys Pro Pro Phe Phe Ser Ala Lys Asn Val Asp Pro Asn Pro Tyr Pro 

195 200 205 

Glu Thr Ser Val Gly Phe Leu Ser Arg Leu Phe Phe Trp Trp Phe Thr 

210 215 220 

Lys Met Ala He Tyr Gly Tyr Arg His Pro Leu Glu Glu Lys Asp Leu 
225 230 235 240 

Trp Ser Leu Lys Glu Glu Asp Arg Ser Gin Met Val Val Gin Gin Leu 

245 250 255 

Leu Glu Ala Trp Arg Lys Gin Glu Lys Gin Thr Ala Arg His Lys Ala 

260 265 270 

Ser Ala Ala Pro Gly Lys Asn Ala Ser Gly Glu Asp Glu Val Leu Leu 

275 280 285 

Gly Ala Arg Pro Arg Pro Arg Lys Pro Ser Phe Leu Lys Ala Leu Leu 

290 295 300 

Ala Thr Phe Gly Ser Ser Phe Leu He Ser Ala Cys Phe Lys Leu He 
305 310 315 " 320 

Gin Asp Leu Leu Ser Phe He Asn Pro Gin Leu Leu Ser He Leu He 

325 330 335 

Arg Phe He Ser Asn Pro Met Ala Pro Ser Trp Trp Gly Phe Leu Val 

340 345 350 

Ala Gly Leu Met Phe Leu Cys Ser Met Met Gin Ser Leu He Leu Gin 

355 360 365 

His Tyr Tyr His Tyr He Phe Val Thr Gly Val Lys Phe Arg Thr Gly 

370 375 380 

He Met Gly Val He Tyr Arg Lys Ala Leu Val He Thr Asn Ser Val 
385 390 395 400 

Lys Arg Ala Ser Thr Val Gly Glu He Val Asn Leu Met Ser Val Asp 

405 410 415 

Ala Gin Arg Phe Met Asp Leu Ala Pro Phe Leu Asn Leu Leu Trp Ser 

420 425 430 

Ala Pro Leu Gin He He Leu Ala He Tyr Phe Leu Trp Gin Asn Leu 



435 



440 



445 
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Gly Pro Ser Val Leu Ala Gly Val Ala Phe Met Val Leu Leu lie Pro 

450 455 460 

Leu Asn Gly Ala Val Ala Val Lys Met Arg Ala Phe Gin Val Lys Gin 
465 470 475 480 

Met Lys Leu Lys Asp Ser Arg He Lys Leu Met Ser Glu He Leu Asn 

485 490 495 

Gly He Lys Val Leu Lys Leu Tyr Ala Trp Glu Pro Ser Phe Leu Lys 

500 505 510 

Gin Val Glu Gly He Arg Gin Gly Glu Leu Gin Leu Leu Arg Thr Ala 

515 520 525 

Ala Tyr Leu His Thr Thr Thr Thr Phe Thr Trp Met Cys Ser Pro Phe 

530 535 540 

Leu Val Thr Leu He Thr Leu Trp Val Tyr Val Tyr Val Asp Pro Asn 
545 550 555 560 

Asn Val Leu Asp Ala Glu Lys Ala Phe Val Ser Val Ser Leu Phe Asn 

565 570 575 

He Leu Arg Leu Pro Leu Asn Met Leu Pro Gin Leu He Ser Asn Leu 

580 585 590 

Thr Gin Ala Ser Val Ser Leu Lys Arg He Gin Gin Phe Leu Ser Gin 

595 600 605 

Glu Glu Leu Asp Pro Gin Ser Val Glu Arg Lys Thr He Ser Pro Gly 

610 615 620 

Tyr Ala He Thr He His Ser Gly Thr Phe Thr Trp Ala Gin Asp Leu 
625 630 635 640 

Pro Pro Thr Leu His Ser Leu Asp He Gin Val Pro Lys Gly Ala Leu 

645 650 655 

Val Ala Val Val Gly Pro Val Gly Cys Gly Lys Ser Ser Leu Val Ser 

660 665 670 

Ala Leu Leu Gly Glu Met Glu Lys Leu Glu Gly Lys Val His Met Lys 

675 680 685 

Gly Ser Val Ala Tyr Val Pro Gin Gin Ala Trp He Gin Asn Cys Thr 

690 695 700 

Leu Gin Glu Asn Val Leu Phe Gly Lys Ala Leu Asn Pro Lys Arg Tyr 
705 710 715 720 

Gin Gin Thr Leu Glu Ala Cys Ala Leu Leu Ala Asp Leu Glu Met Leu 

725 730 735 

Pro Gly Gly Asp Gin Thr Glu He Gly Glu Lys Gly He Asn Leu Ser 

740 745 750 

Gly Gly Gin Arg Gin Arg Val Ser Leu Ala Arg Ala Val Tyr Ser Asp 

755 760 765 

Ala Asp He Phe Leu Leu Asp Asp Pro Leu Ser Ala Val Asp Ser His 

770 775 780 

Val Ala Lys His He Phe Asp His Val He Gly Pro Glu Gly Val Leu 
785 790 795 800 

Ala Gly Lys Thr Arg Val Leu Val Thr His Gly He Ser Phe Leu Pro 

805 810 815 

Gin Thr Asp Phe He He Val Leu Ala Asp Gly Gin Val Ser Glu Met 

820 825 830 

Gly Pro Tyr Pro Ala Leu Leu Gin Arg Asn Gly Ser Phe Ala Asn Phe 

835 840 845 

Leu Cys Asn Tyr Ala Pro Asp Glu Asp Gin Gly His Leu Glu Asp Ser 

850 855 860 

Trp Thr Ala Leu Glu Gly Ala Glu Asp Lys Glu Ala Leu Leu He Glu 
865 870 875 880 

Asp Thr Leu Ser Asn His Thr Asp Leu Thr Asp Asn Asp Pro Val Thr 

885 890 895 

Tyr Val Val Gin Lys Gin Phe Met Arg Gin Leu Ser Ala Leu Ser Ser 

900 905 910 

Asp Gly Glu Gly Gin Gly Arg Pro Val Pro Arg Arg His Leu Gly Pro 

915 920 925 

Ser Glu Lys Val Gin Val Thr Glu Ala Lys Ala Asp Gly Ala Leu Thr 

930 935 940 

Gin Glu Glu Lys Ala Ala He Gly Thr Val Glu Leu Ser Val Phe Trp 
945 950 955 960 

Asp Tyr Ala Lys Ala Val Gly Leu Cys Thr Thr Leu Ala He Cys Leu 
965 970 975 
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Leu Tyr Val Gly Gin Ser Ala Ala Ala He Gly Ala Asn Val Trp Leu 

980 985 990 

Ser Ala Trp Thr Asn Asp Ala Met Ala Asp Ser Arg Gin Asn Asn Thr 

995 1000 1005 

Ser Leu Arg Leu Gly Val Tyr Ala Ala Leu Gly He Leu Gin Gly Phe 

1010 1015 1020 

Leu Val Met Leu Ala Ala Met Ala Met Ala Ala Gly Gly He Gin Ala 
1025 1030 1035 1040 

Ala Arg Val Leu His Gin Ala Leu Leu His Asn Lys He Arg Ser Pro 

1045 1050 1055 

Gin Ser Phe Phe Asp Thr Thr Pro Ser Gly Arg He Leu Asn Cys Phe 

1060 1065 1070 

Ser Lys Asp He Tyr Val Val Asp Glu Val Leu Ala Pro Val He Leu 

1075 1080 1085 

Met Leu Leu Asn Ser Phe Phe Asn Ala He Ser Thr Leu Val Val He 

1090 1095 1100 

Met Ala Ser Thr Pro Leu Phe Thr Val Val He Leu Pro Leu Ala Val 
1105 1110 1115 1120 

Leu Tyr Thr Leu Val Gin Arg Phe Tyr Ala Ala Thr Ser Arg Gin Leu 

1125 1130 1135 

Lys Arg Leu Glu Ser Val Ser Arg Ser Pro He Tyr Ser His Phe Ser 

1140 1145 1150 

Glu Thr Val Thr Gly Ala Ser Val He Arg Ala Tyr Asn Arg Ser Arg 

1155 1160 1165 

Asp Phe Glu He He Ser Asp Thr Lys Val Asp Ala Asn Gin Arg Ser 

1170 1175 1180 

Cys Tyr Pro Tyr He He Ser Asn Arg Trp Leu Ser He Gly Val Glu 
1185 1190 1195 1200 

Phe Val Gly Asn Cys Val Val Leu Phe Ala Ala Leu Phe Ala Val He 

1205 1210 1215 

Gly Arg Ser Ser Leu Asn Pro Gly Leu Val Gly Leu Ser Val Ser Tyr 

1220 1225 1230 

Ser Leu Gin Val Thr Phe Ala Leu Asn Trp Met He Arg Met Met Ser 

1235 1240 1245 

Asp Leu Glu Ser Asn He Val Ala Val Glu Arg Val Lys Glu Tyr Ser 

1250 1255 1260 

Lys Thr Glu Thr Glu Ala Pro Trp Val Val Glu Gly Ser Arg Pro Pro 
1265 1270 1275 1280 

Glu Gly Trp Pro Pro Arg Gly Glu Val Glu Phe Arg Asn Tyr Ser Val 

1285 1290 1295 

Arg Tyr Arg Pro Gly Leu Asp Leu Val Leu Arg Asp Leu Ser Leu His 

1300 1305 1310 

Val His Gly Gly Glu Lys Val Gly He Val Gly Arg Thr Gly Ala Gly 

1315 1320 1325 

Lys Ser Ser Met Thr Leu Cys Leu Phe Arg He Leu Glu Ala Ala Lys 

1330 1335 1340 

Gly Glu He Arg He Asp Gly Leu Asn Val Ala Asp He Gly Leu His 
1345 1350 1355 1360 

Asp Leu Arg Ser Gin Leu Thr He He Pro Gin Asp Pro He Leu Phe 

1365 1370 1375 

Ser Gly Thr Leu Arg Met Asn Leu Asp Pro Phe Gly Ser Tyr Ser Glu 

1380 1385 1390 

Glu Asp He Trp Trp Ala Leu Glu Leu Ser His Leu His Thr Phe Val 

1395 1400 1405 

Ser Ser Gin Pro Ala Gly Leu Asp Phe Gin Cys Ser Glu Gly Gly Glu 

1410 1415 1420 

Asn Leu Ser Val Gly Gin Arg Gin Leu Val Cys Leu Ala Arg Ala Leu 
1425 1430 1435 1440 

Leu Arg Lys Ser Arg He Leu Val Leu Asp Glu Ala Thr Ala Ala He 

1445 1450 1455 

Asp Leu Glu Thr Asp Asn Leu He Gin Ala Thr He Arg Thr Gin Phe 

1460 1465 1470 

Asp Thr Cys Thr Val Leu Thr He Ala His Arg Leu Asn Thr He Met 

1475 1480 1485 

Asp Tyr Thr Arg Val Leu Val Leu Asp Lys Gly Val Val Ala Glu Phe 
1490 1495 1500 
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Asp Ser Pro Ala Asn Leu He Ala Ala Arg Gly He Phe Tyr Gly Met 
1505 1510 1515 1520 

Ala Arg Asp Ala Gly Leu Ala 
1525 

<210> 7 
<211> 4509 
<212> DNA 

<213> Homo sapiens 
<400> 7 

atggccgcgc ctgctgagcc ctgcgcgggg cagggggtct ggaaccagac agagcctgaa 60 

cctgccgcca ccagcctgct gagcctgtgc ttcctgagaa cagcaggggt ctgggtaccc 120 

cccatgtacc tctgggtcct tggtcccatc tacctcctct tcatccacca ccatggccgg 180 

ggctacctcc ggatgtcccc actcttcaaa gccaagatgg tgcttggatt cgccctcata 240 

gtcctgtgta cctccagcgt ggctgtcgct ctttggaaaa tccaacaggg aacgcctgag 300 

gccccagaat tcctcattca tcctactgtg tggctcacca cgatgagctt cgcagtgttc 360 

ctgattcaca ccgagaggaa aaagggagtc cagtcatctg gagtgctgtt tggttactgg 420 

cttctctgct ttgtcttgcc agctaccaac gctgcccagc aggcctccgg agcgggcttc 480 

cagagcgacc ctgtccgcca cctgtccacc tacctatgcc tgtctctggt ggtggcacag 540 

tttgtgctgt cctgcctggc ggatcaaccc cccttcttcc ctgaagaccc ccagcagtct 60 0 

aacccctgtc cagagactgg ggcagccttc ccctccaaag ccacgttctg gtgggtttct 660 

ggcctggtct ggaggggata caggaggcca ctgagaccaa aagacctctg gtcgcttggg 720 

agagaaaact cctcagaaga acttgtttcc cggcttgaaa aggagtggat gaggaaccgc 780 

agtgcagccc ggaggcacaa caaggcaata gcatttaaaa ggaaaggcgg cagtggcatg 840 

aaggctccag agaccgagcc cttcctacgg caagaaggga gccagtggcg cccactgctg 900 

aaggccatct ggcaggtgtt ccattctacc ttcctcctgg ggaccctcag cctcatcatc 9 60 

agtgatgtct tcaggttcac tgtccccaag ctgctcagcc ttttcctgga gtttattggt 1020 

gatcccaagc ctccagcctg gaagggctac ctcctcgccg tgctgatgtt cctctcagcc 1080 

tgcctgcaaa cgctgtttga gcagcagaac atgtacaggc tcaaggtgcc gcagatgagg 1140 

ttgcggtcgg ccatcactgg cctggtgtac agaaaggtcc tggctctgtc cagcggctcc 1200 

agaaaggcca gtgcggtggg tgatgtggtc aatctggtgt ccgtggacgt gcagcggctg 12 60 

accgagagcg tcctctacct caacgggctg tggctgcctc tcgtctggat cgtggtctgc 1320 

ttcgtctatc tctggcagct cctggggccc tccgccctca ctgccatcgc tgtcttcctg 1380 

agcctcctcc ctctgaattt cttcatctcc aagaaaagga accaccatca ggaggagcaa 1440 

atgaggcaga aggactcacg ggcacggctc accagctcta tcctcaggaa ctcgaagacc 1500 

atcaagttcc atggctggga gggagccttt ctggacagag tcctgggcat ccgaggccag 1560 

gagctgggcg ccttgcggac ctccggcctc ctcttctctg tgtcgctggt gtccttccaa 1620 

gtgtctacat ttctggtcgc actggtggtg tttgctgtcc acactctggt ggccgagaat 1680 

gctatgaatg cagagaaagc ctttgtgact ctcacagttc tcaacatcct caacaaggcc 1740 

caggctttcc tgcccttctc catccactcc ctcgtccagg cccgggtgtc ctttgaccgt 1800 

ctggtcacct tcctctgcct ggaagaagtt gaccctggtg tcgtagactc aagttcctct 1860 

ggaagcgctg ccgggaagga ttgcatcacc atacacagtg ccaccttcgc ctggtcccag 1920 

gaaagccctc cctgcctcca cagaataaac ctcacggtgc cccagggctg tctgctggct 1980 

gttgtcggtc cagtgggggc agggaagtcc tccctgctgt ccgccctcct tggggagctg 2040 

tcaaaggtgg aggggttcgt gagcatcgag ggtgctgtgg cctacgtgcc ccaggaggcc 2100 

tgggtgcaga acacctctgt ggtagagaat gtgtgcttcg ggcaggagct ggacccaccc 2160 

tggctggaga gagtactaga agcctgtgcc ctgcagccag atgtggacag cttccctgag 2220 

ggaatccaca cttcaattgg ggagcagggc atgaatctct ccggaggcca gaagcagcgg 2280 

ctgagcctgg cccgggctgt atacagaaag gcagctgtgt acctgctgga tgaccccctg 2340 

gcggccctgg atgcccacgt tggccagcat gtcttcaacc aggtcattgg gcctggtggg 2400 

ctactccagg gaacaacacg gattctcgtg acgcacgcac tccacatcct gccccaggct 2460 

gattggatca tagtgctggc aaatggggcc atcgcagaga tgggttccta ccaggagctt 2520 

ctgcagagga agggggccct cgtgtgtctt ctggatcaag ccagacagcc aggagataga 2580 

ggagaaggag aaacagaacc tgggaccagc accaaggacc ccagaggcac ctctgcaggc 2640 

aggaggcccg agcttagacg cgagaggtcc atcaagtcag tccctgagaa ggaccgtacc 2700 

acttcagaag cccagacaga ggttcctctg gatgaccctg acagggcagg atggccagca 2760 

ggaaaggaca gcatccaata cggcagggtg aaggccacag tgcacctggc ctacctgcgt 2820 

gccgtgggca cccccctctg cctctacgca ctcttcctct tcctctgcca gcaagtggcc 2880 

tccttctgcc ggggctactg gctgagcctg tgggcggacg accctgcagt aggtgggcag 2940 

cagacgcagg cagccctgcg tggcgggatc ttcgggctcc tcggctgtct ccaagccatt 3000 

gggctgtttg ■ cct.ccatggc tgcggtgctc ctaggtgggg cccgggcatc caggttgctc 3060 

ttccagaggc tcctgtggga tgtggtgcga tctcccatca gcttctttga gcggacaccc 3120 

attggtcacc tgctaaaccg cttctccaag gagacagaca cggttgacgt ggacattcca 3180 

gacaaactcc ggtccctgct gatgtacgcc tttggactcc tggaggtcag cctggtggtg 3240 

gcagtggcta ccccactggc cactgtggcc atcctgccac tgtttctcct ctacgctggg 3300 
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tttcagagcc tgtatgtggt tagctcatgc cagctgagac gcttggagtc agccagctac 3360 

tcgtctgtct gctcccacat ggctgagacg ttccagggca gcacagtggt ccgggcattc 3420 

cgaacccagg ccccctttgt ggctcagaac aatgctcgcg tagatgaaag ccagaggatc 3480 

agtttcccgc gactggtggc tgacaggtgg cttgcggcca atgtggagct cctggggaat 3540 

ggcctggtgt ttgcagccgc cacgtgtgct gtgctgagca aagcccacct cagtgctggc 3 600 

ctcgtgggct tctctgtctc tgctgccctc caggtgaccc agacactgca gtgggttgtt 3660 

cgcaactgga cagacctaga gaacagcatc gtgtcagtgg agcggatgca ggactatgcc 3720 

tggacgccca aggaggctcc ctggaggctg cccacatgtg cagctcagcc cccctggcct 3780 

cagggcgggc agatcgagtt ccgggacttt gggctaagat gccgacctga gctcccgctg 3840 

gctgtgcagg gcgtgtcctt caagatccac gcaggagaga aggtgggcat cgttggcagg 3900 

accggggcag ggaagtcctc cctggccagt gggctgctgc ggctccagga ggcagctgag 3960 

ggtgggatct ggatcgacgg ggtccccatt gcccacgtgg ggctgcacac actgcgctcc 402 0 

aggatcagca tcatccccca ggaccccatc ctgttccctg gctctctgcg gatgaacctc 4080 

gacctgctgc aggagcactc ggacgaggct atctgggcag ccctggagac ggtgcagctc 4140 

aaagccttgg tggccagcct gcccggccag ctgcagtaca agtgtgctga ccgaggcgag 4200 

gacctgagcg tgggccagaa acagctcctg tgtctggcac gtgcccttct ccggaagacc 42 60 

cagatcctca tcctggacga ggctactgct gccgtggacc ctggcacgga gctgcagatg 4320 

caggccatgc tcgggagctg gtttgcacag tgcactgtgc tgcccattgc ccaccgcctg 4380 

cgctccgtga tggactgtgc ccgggttctg gtcatggaca aggggcaggt ggcagagagc 4440 

ggcagcccgg cccagctgct ggcccagaag ggcctgtttt acagactggc ccaggagtca 4500 

ggcctggtc 4509 

<210> 8 
<211> 1503 
<212> PRT 

<213> Homo sapiens 
<400> 8 

Met Ala Ala Pro Ala Glu Pro Cys Ala Gly Gin Gly Val Trp Asn Gin 

15 10 * 15 

Thr Glu Pro Glu Pro Ala Ala Thr Ser Leu Leu Ser Leu Cys Phe Leu 

20 25 30 

Arg Thr Ala Gly Val Trp Val Pro Pro Met Tyr Leu Trp Val Leu Gly 

35 40 ' 45 

Pro He Tyr Leu Leu Phe He His His His Gly Arg Gly Tyr Leu Arg 

50 55 60 

Met Ser Pro Leu Phe Lys Ala Lys Met Val Leu Gly Phe Ala Leu He 
65 70 75 80 

Val Leu Cys Thr Ser Ser Val Ala Val Ala Leu Trp Lys He Gin Gin 

85 90 95 

Gly Thr Pro Glu Ala Pro Glu Phe Leu He His Pro Thr Val Trp Leu 

100 105 110 

Thr Thr Met Ser Phe Ala Val Phe Leu He His Thr Glu Arg Lys Lys 

115 120 125 

Gly Val Gin Ser Ser Gly Val Leu Phe Gly Tyr Trp Leu Leu Cys Phe 

130 135 140 

Val Leu Pro Ala Thr Asn Ala Ala Gin Gin Ala Ser Gly Ala Gly Phe 
145 150 155 160 

Gin Ser Asp Pro Val Arg His Leu Ser Thr Tyr Leu Cys Leu Ser Leu 

165 170 175 

Val Val Ala Gin Phe Val Leu Ser Cys Leu Ala Asp Gin Pro Pro Phe 

180 185 190 

Phe Pro Glu Asp Pro Gin Gin Ser Asn Pro Cys Pro Glu Thr Gly Ala 

195 200 205 

Ala Phe Pro Ser Lys Ala Thr Phe Trp Trp Val Ser Gly Leu Val Trp 

210 215 220 

Arg Gly Tyr Arg Arg Pro Leu Arg Pro Lys Asp Leu Trp Ser Leu Gly 
225 230 235 240 

Arg Glu Asn Ser Ser Glu Glu Leu Val Ser Arg Leu Glu Lys Glu Trp 

245 250 255 

Met Arg Asn Arg Ser Ala Ala Arg Arg His Asn Lys Ala He Ala Phe 

260 265 270 

Lys Arg Lys Gly Gly Ser Gly Met Lys Ala Pro Glu Thr Glu Pro Phe 

275 280 285 

Leu Arg Gin Glu Gly Ser Gin Trp Arg Pro Leu Leu Lys Ala He Trp 
290 295 300 
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Gin Val Phe His Ser Thr Phe Leu Leu Gly Thr Leu Ser Leu He He 
305 310 315 320 

Ser Asp Val Phe Arg Phe Thr Val Pro Lys Leu Leu Ser Leu Phe Leu 

325 330 335 

Glu Phe He Gly Asp Pro Lys Pro Pro Ala Trp Lys Gly Tyr Leu Leu 

340 345 350 

Ala Val Leu Met Phe Leu Ser Ala Cys Leu Gin Thr Leu Phe Glu Gin 

355 360 365 

Gin Asn Met Tyr Arg Leu Lys Val Pro Gin Met Arg Leu Arg Ser Ala 

370 375 380 

He Thr Gly Leu Val Tyr Arg Lys Val Leu Ala Leu Ser Ser Gly Ser 
385 390 395 400 

Arg Lys Ala Ser Ala Val Gly Asp Val Val Asn Leu Val Ser Val Asp 

405 410 415 

Val Gin Arg Leu Thr Glu Ser Val Leu Tyr Leu Asn Gly Leu Trp Leu 

420 425 430 

Pro Leu Val Trp He Val Val Cys Phe Val Tyr Leu Trp Gin Leu Leu 

435 440 445 

Gly Pro Ser Ala Leu Thr Ala He Ala Val Phe Leu Ser Leu Leu Pro 

450 455 460 

Leu Asn Phe Phe He Ser Lys Lys Arg Asn His His Gin Glu Glu Gin 
465 470 475 480 

Met Arg Gin Lys Asp Ser Arg Ala Arg Leu Thr Ser Ser He Leu Arg 

485 490 495 

Asn Ser Lys Thr He Lys Phe His Gly Trp Glu Gly Ala Phe Leu Asp 

500 505 510 

Arg Val Leu Gly He Arg Gly Gin Glu Leu Gly Ala Leu Arg Thr Ser 

515 520 525 

Gly Leu Leu Phe Ser Val Ser Leu Val Ser Phe Gin Val Ser Thr Phe 

530 535 540 

Leu Val Ala Leu Val Val Phe Ala Val His Thr Leu Val Ala Glu Asn 
545 550 555 560 

Ala Met Asn Ala Glu Lys Ala Phe Val Thr Leu Thr Val Leu Asn He 

565 570 575 

Leu Asn Lys Ala Gin Ala Phe Leu Pro Phe Ser He His Ser Leu Val 

580 585 590 

Gin Ala Arg Val Ser Phe Asp Arg Leu Val Thr Phe Leu Cys Leu Glu 

595 600 605 

Glu Val Asp Pro Gly Val Val Asp Ser Ser Ser Ser Gly Ser Ala Ala 

610 615 620 

Gly Lys Asp Cys He Thr He His Ser Ala Thr Phe Ala Trp Ser Gin 
625 630 635 640 

Glu Ser Pro Pro Cys Leu His Arg He Asn Leu Thr Val Pro Gin Gly 

645 650 655 

Cys Leu Leu Ala Val Val Gly Pro Val Gly Ala Gly Lys Ser Ser Leu 

660 665 670 

Leu Ser Ala Leu Leu Gly Glu Leu Ser Lys Val Glu Gly Phe Val Ser 

675 680 685 

He Glu Gly Ala Val Ala Tyr Val Pro Gin Glu Ala Trp Val Gin Asn 

690 695 700 

Thr Ser Val Val Glu Asn Val Cys Phe Gly Gin Glu Leu Asp Pro Pro 
705 710 715 720 

Trp Leu Glu Arg Val Leu Glu Ala Cys Ala Leu Gin Pro Asp Val Asp 

725 730 735 

Ser Phe Pro Glu Gly He His Thr Ser He Gly Glu Gin Gly Met Asn 

740 745 750 

Leu Ser Gly Gly Gin Lys Gin Arg Leu Ser Leu Ala Arg Ala Val Tyr 

755 760 765 

Arg Lys Ala Ala Val Tyr Leu Leu Asp Asp Pro Leu Ala Ala Leu Asp 

770 775 780 

Ala His Val Gly Gin His Val Phe Asn Gin Val He Gly Pro Gly Gly 
785 790 795 " 800 

Leu Leu Gin Gly Thr Thr Arg He Leu Val Thr His Ala Leu His He 

805 810 815 

Leu Pro Gin Ala Asp Trp He He Val Leu Ala Asn Gly Ala He Ala 
820 825 830 
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Glu Met Gly Ser Tyr Gin Glu Leu Leu Gin Arg Lys Gly Ala Leu Val 

835 840 845 

Cys Leu Leu Asp Gin Ala Arg Gin Pro Gly Asp Arg Gly Glu Gly Glu 

850 855 860 

Thr Glu Pro Gly Thr Ser Thr Lys Asp Pro Arg Gly Thr Ser Ala Gly 
865 870 875 880 

Arg Arg Pro Glu Leu Arg Arg Glu Arg Ser He Lys Ser Val Pro Glu 

885 890 895 

Lys Asp Arg Thr Thr Ser Glu Ala Gin Thr Glu Val Pro Leu Aso Asp 

900 905 910 

Pro Asp Arg Ala Gly Trp Pro Ala Gly Lys Asp Ser He Gin Tyr Gly 

915 920 925 

Arg Val Lys Ala Thr Val His Leu Ala Tyr Leu Arg Ala Val Gly Thr 

930 935 940 

Pro Leu Cys Leu Tyr Ala Leu Phe Leu Phe Leu Cys Gin Gin Val Ala 
945 950 955 960 

Ser Phe Cys Arg Gly Tyr Trp Leu Ser Leu Trp Ala Asp Asp Pro Ala 

965 970 975 

Val Gly Gly Gin Gin Thr Gin Ala Ala Leu Arg Gly Gly He Phe Gly 

980 985 990 

Leu Leu Gly Cys Leu Gin Ala He Gly Leu Phe Ala Ser Met Ala Ala 

995 1000 1005 

Val Leu Leu Gly Gly Ala Arg Ala Ser Arg Leu Leu Phe Gin Arg Leu 

1010 1015 1020 

Leu Trp Asp Val Val Arg Ser Pro He Ser Phe Phe Glu Arg Thr Pro 
1025 1030 1035 1040 

He Gly His Leu Leu Asn Arg Phe Ser Lys Glu Thr Asp Thr Val Asp 

1045 1050 1055 

Val Asp He Pro Asp Lys Leu Arg Ser Leu Leu Met Tyr Ala Phe Gly 

1060 1065 1070 

Leu Leu Glu Val Ser Leu Val Val Ala Val Ala Thr Pro Leu Ala Thr 

1075 1080 1085 

Val Ala He Leu Pro Leu Phe Leu Leu Tyr Ala Gly Phe Gin Ser Leu 

1090 1095 1100 

Tyr Val Val Ser Ser Cys Gin Leu Arg Arg Leu Glu Ser Ala Ser Tyr 
1105 1110 1115 1120 

Ser Ser Val Cys Ser His Met Ala Glu Thr Phe Gin Gly Ser Thr Val 

1125 1130 H35 

Val Arg Ala Phe Arg Thr Gin Ala Pro Phe Val Ala Gin Asn Asn Ala 

1140 1145 H50 

Arg Val Asp Glu Ser Gin Arg He Ser Phe Pro Arg Leu Val Ala Asp 

1155 1160 H65 

Arg Trp Leu Ala Ala Asn Val Glu Leu Leu Gly Asn Gly Leu Val Phe 

1170 1175 1180 

Ala Ala Ala Thr Cys Ala Val Leu Ser Lys Ala His Leu Ser Ala Gly 
1185 1190 1195 1200 

Leu Val Gly Phe Ser Val Ser Ala Ala Leu Gin Val Thr Gin Ala Leu 

1205 1210 1215 

Gin Trp Val Val Arg Asn Trp Thr Asp Leu Glu Asn Ser He Val Ser 

1220 1225 1230 

Val Glu Arg Met Gin Asp Tyr Ala Trp Thr Pro Lys Glu Ala Pro Trp 

1235 1240 1245 

Arg Leu Pro Thr Cys Ala Ala Gin Pro Pro Trp Pro Gin Gly Gly Gin 

1250 1255 1260 

He Glu Phe Arg Asp Phe Gly Leu Arg Tyr Arg Pro Glu Leu Pro Leu 
1265 1270 1275 1280 

Ala Val Gin Gly Val Ser Leu Lys He His Ala Gly Glu Lys Val Gly 

1285 1290 1295 

He Val Gly Arg Thr Gly Ala Gly Lys Ser Ser Leu Ala Ser Gly Leu 

1300 1305 1310 

Leu Arg Leu Gin Glu Ala Ala Glu Gly Gly He Trp He Asp Gly Val 

1315 1320 1325 

Pro He Ala His Val Gly Leu His Thr Leu Arg Ser Arg He Ser He 

1330 1335 1340 

He Pro Gin Asp Pro He Leu Phe Pro Gly Ser Leu Arg Met Asn Leu 
1345 1350 1355 1360 
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Asp Leu Leu Gin Glu His Ser Asp Glu Ala He Trp Ala Ala Leu Glu 

1365 1370 1375 

Thr Val Gin Leu Lys Ala Leu Val Ala Ser Leu Pro Gly Gin Leu Gin 

1380 1385 1390 

Tyr Lys Cys Ala Asp Arg Gly Glu Asp Leu Ser Val Gly Gin Lys Gin 

1395 1400 1405 

Leu Leu Cys Leu Ala Arg Ala Leu Leu Arg Lys Thr Gin He Leu He 

1410 1415 1420 

Leu Asp Glu Ala Thr Ala Ala Val Asp Pro Gly Thr Glu Leu Gin Met 
1425 1430 1435 1440 

Gin Ala Met Leu Gly Ser Trp Phe Ala Gin Cys Thr Val Leu Leu He 

1445 1450 1455 

Ala His Arg Leu Arg Ser Val Met Asp Cys Ala Arg Val Leu Val Met 

1460 1465 1470 

Asp Lys Gly Gin Val Ala Glu Ser Gly Ser Pro Ala Gin Leu Leu Ala 

1475 1480 1485 

Gin Lys Gly Leu Phe Tyr Arg Leu Ala Gin Glu Ser Gly Leu Val 
1490 1495 1500 

<210> 9 
<211> 18 
<212> DNA 

<213> Artificial Sequence 



/note=" synthetic i 



<400> 9 
ctdgtdgcdg tdgtdggn 

<210> 10 
<211> 19 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Sequence source: /note=" synthetic construct" 

<400> 10 
atggccgcgc ctgctgagc 

<210> 11 
<211> 20 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Sequence source : /note= " synthetic construct" 

<400> 11 
gtctacgaca ccagggtcaa 

<210> 12 
<211> 20 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Sequence source : /note= " synthetic construct" 

<400> 12 
ctgcctggaa gaagttgacc 

<210> 13 
<211> 20 
<212> DNA 
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<213> Artificial Sequence 
<220> 

<223> Sequence source : /note= " synthetic construct" 

<400> 13 
ctggaatgtc cacgtcaacc 

<210> 14 
<211> 20 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Sequence source : /note= " synthetic construct" 

<400> 14 
ggagacagac acggttgacg 

<210> 15 
<211> 19 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Sequence source : /note= " synthetic construct" 

<400> 15 
gcagaccagg cctgactcc 

<210> 16 
<211> 24 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Sequence source : /note= "synthetic construct" 

<400> 16 
rctnavngcn swnarnggnt crtc 

<210> 17 
<211> 29 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Sequence source : /note= " synthetic construct" 

<400> 17 
cgggatccag rgaraayath ctntttggn 

<210> 18 
<211> 29 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Sequence source : /note= " synthetic construct" 

<400> 18 
cggaattcnt crtchagnag rtadatrtc 
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